Skip to main content
Distance metrics define how vectors are compared for similarity. Choosing the right metric is critical for accurate search results and depends on your embedding model.

Available Distance Metrics

Qdrant supports four distance metrics:
// From lib/segment/src/types.rs:306-315
pub enum Distance {
    Cosine,     // Cosine similarity
    Euclid,     // Euclidean distance  
    Dot,        // Dot product
    Manhattan,  // Manhattan distance
}

Cosine Similarity

Measures the angle between vectors, ignoring magnitude:
from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="semantic_search",
    vectors_config=models.VectorParams(
        size=384,
        distance=models.Distance.COSINE
    )
)
Formula:
cosine_similarity(A, B) = (A · B) / (||A|| * ||B||)
Score Range: -1 (opposite) to 1 (identical) Characteristics:
  • Normalized: Vector magnitude doesn’t affect similarity
  • Order: Higher scores are better (maximization)
  • Symmetric: cosine(A, B) = cosine(B, A)
Cosine is the most common metric for semantic search and text embeddings. Most embedding models (OpenAI, Cohere, Sentence Transformers) are optimized for cosine similarity.

When to Use Cosine

Use Cosine when:
  • Working with text embeddings (BERT, GPT, etc.)
  • Your embedding model outputs normalized vectors
  • You care about direction, not magnitude
  • Using pre-trained models (OpenAI, Cohere, etc.)
Don’t use Cosine when:
  • Vector magnitude contains important information
  • Your model was specifically trained for Euclidean distance

Example: Text Similarity

# Vectors with same direction but different magnitudes are similar
vec_a = [1.0, 2.0, 3.0]     # magnitude = 3.74
vec_b = [2.0, 4.0, 6.0]     # magnitude = 7.48

# Cosine similarity = 1.0 (identical direction)

Euclidean Distance (L2)

Measures straight-line distance between vectors:
client.create_collection(
    collection_name="image_search",
    vectors_config=models.VectorParams(
        size=512,
        distance=models.Distance.EUCLID
    )
)
Formula:
euclidean_distance(A, B) = sqrt(Σ(Ai - Bi)²)
Score Range: 0 (identical) to ∞ (very different) Characteristics:
  • Magnitude-sensitive: Vector length affects distance
  • Order: Lower scores are better (minimization)
  • Symmetric: euclid(A, B) = euclid(B, A)
Euclidean distance considers both direction and magnitude. It’s commonly used for image embeddings and when vector magnitude is meaningful.

When to Use Euclidean

Use Euclidean when:
  • Vector magnitude is meaningful
  • Working with image embeddings (ResNet, EfficientNet)
  • Your model was trained with Euclidean distance
  • You need geometric distance in embedding space
Don’t use Euclidean when:
  • Vectors have inconsistent magnitudes
  • You only care about direction

Example: Spatial Distance

# Vectors with similar direction but different magnitudes are different
vec_a = [1.0, 2.0, 3.0]
vec_b = [2.0, 4.0, 6.0]

# Euclidean distance = 3.74 (significant difference)

Dot Product

Computes the product of corresponding elements:
client.create_collection(
    collection_name="recommendation",
    vectors_config=models.VectorParams(
        size=256,
        distance=models.Distance.DOT
    )
)
Formula:
dot_product(A, B) = Σ(Ai * Bi)
Score Range: -∞ to ∞ Characteristics:
  • Unnormalized: Magnitude matters significantly
  • Order: Higher scores are better (maximization)
  • Symmetric: dot(A, B) = dot(B, A)
  • Efficient: Fastest to compute
Dot product is sensitive to vector magnitude. Longer vectors will have higher scores even if less similar in direction.

When to Use Dot Product

Use Dot Product when:
  • Your vectors are pre-normalized
  • Working with binary or categorical features
  • Your model was specifically trained for dot product
  • Maximum computational efficiency is needed
Don’t use Dot Product when:
  • Vectors have inconsistent magnitudes
  • You need normalized similarity scores

Relationship with Cosine

For normalized vectors (||v|| = 1), dot product equals cosine similarity:
# If vectors are normalized
import numpy as np

vec_a = [1.0, 0.0, 0.0]
vec_b = [0.707, 0.707, 0.0]

dot_product = np.dot(vec_a, vec_b)  # 0.707
cosine_sim = np.dot(vec_a, vec_b) / (np.linalg.norm(vec_a) * np.linalg.norm(vec_b))  # 0.707

# They're identical for normalized vectors!
If your model outputs normalized vectors, use Dot Product instead of Cosine for better performance.

Manhattan Distance (L1)

Sum of absolute differences:
client.create_collection(
    collection_name="specialized",
    vectors_config=models.VectorParams(
        size=128,
        distance=models.Distance.MANHATTAN
    )
)
Formula:
manhattan_distance(A, B) = Σ|Ai - Bi|
Score Range: 0 (identical) to ∞ (very different) Characteristics:
  • Axis-aligned: Measures distance along axes
  • Order: Lower scores are better (minimization)
  • Symmetric: manhattan(A, B) = manhattan(B, A)
  • Robust: Less sensitive to outliers than Euclidean

When to Use Manhattan

Use Manhattan when:
  • Your features are independent/axis-aligned
  • You want robustness to outliers
  • Working with grid-like data
  • Your model was trained with Manhattan distance
Don’t use Manhattan when:
  • Most embedding models (rarely optimized for L1)
  • You need standard semantic similarity
Manhattan is less common in vector search but useful for specialized applications where features are independent.

Score Ordering

// From lib/segment/src/types.rs:342-347
pub fn distance_order(&self) -> Order {
    match self {
        Distance::Cosine | Distance::Dot => Order::LargeBetter,
        Distance::Euclid | Distance::Manhattan => Order::SmallBetter,
    }
}
MetricBetter ScoreRange
CosineHigher (maximize)-1 to 1
EuclideanLower (minimize)0 to ∞
Dot ProductHigher (maximize)-∞ to ∞
ManhattanLower (minimize)0 to ∞
Qdrant automatically handles score ordering. Top results always represent most similar vectors regardless of metric.

Common Embedding Models

Match your distance metric to your model:
ModelDimensionsDistance
OpenAI text-embedding-3-small1536Cosine
OpenAI text-embedding-3-large3072Cosine
Cohere embed-english-v3.01024Cosine
Cohere embed-multilingual-v3.01024Cosine
Sentence Transformers (all-MiniLM-L6-v2)384Cosine
Sentence Transformers (all-mpnet-base-v2)768Cosine
CLIP ViT-B/32512Cosine
ResNet-502048Euclidean
BGE-large-en-v1.51024Cosine
E5-large-v21024Cosine
Always check your model’s documentation for the recommended distance metric.

Performance Comparison

MetricComputation SpeedMemoryUse Case
Dot ProductFastestLowestPre-normalized vectors
CosineFastMediumGeneral semantic search
EuclideanMediumMediumImage embeddings
ManhattanMediumMediumSpecialized applications

Threshold Filtering

Set minimum similarity thresholds:
# Cosine: only return results with similarity > 0.7
results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, ...],
    score_threshold=0.7,  # For Cosine: higher is better
    limit=10
)

# Euclidean: only return results with distance < 5.0
results = client.search(
    collection_name="image_collection",
    query_vector=[0.1, 0.2, ...],
    score_threshold=5.0,  # For Euclidean: lower is better
    limit=10
)
// From lib/segment/src/types.rs:357-361
pub fn check_threshold(&self, score: ScoreType, threshold: ScoreType) -> bool {
    match self.distance_order() {
        Order::LargeBetter => score > threshold,
        Order::SmallBetter => score < threshold,
    }
}
Threshold direction depends on the metric:
  • Cosine/Dot: Use high thresholds (e.g., 0.7) to filter for similar items
  • Euclidean/Manhattan: Use low thresholds (e.g., 5.0) to filter for similar items

Choosing the Right Metric

Decision Flow

  1. Check your model’s documentation - Use the recommended metric
  2. Are vectors normalized?
    • Yes → Use Dot Product (fastest) or Cosine (more intuitive)
    • No → Continue to step 3
  3. Does magnitude matter?
    • No → Use Cosine
    • Yes → Use Euclidean
  4. Special requirements?
    • Need outlier robustness → Manhattan
    • Maximum speed → Dot Product (with normalization)

Quick Reference

Use Cosine. Nearly all text embedding models are optimized for cosine similarity.
Use Euclidean or Cosine. Check your model’s documentation. ResNets often use Euclidean, CLIP uses Cosine.
Use Dot Product if your vectors are pre-normalized. It’s computationally cheaper than Cosine.
Use Dot Product. It’s equivalent to Cosine but faster.

Vector Normalization

Normalize vectors for consistent Cosine/Dot Product behavior:
import numpy as np

def normalize_vector(vector):
    """Normalize vector to unit length"""
    norm = np.linalg.norm(vector)
    if norm == 0:
        return vector
    return vector / norm

# Example
vector = [3.0, 4.0, 0.0]
normalized = normalize_vector(vector)
print(normalized)  # [0.6, 0.8, 0.0]
print(np.linalg.norm(normalized))  # 1.0
# Batch normalization
vectors = np.array([
    [3.0, 4.0, 0.0],
    [1.0, 0.0, 0.0],
    [0.5, 0.5, 0.707]
])

# Normalize along each row
norms = np.linalg.norm(vectors, axis=1, keepdims=True)
normalized = vectors / norms
Many embedding models (OpenAI, Cohere) return pre-normalized vectors. Check before normalizing!

Best Practices

Dot Product is sensitive to magnitude. Normalize vectors first unless magnitude is meaningful.
Benchmark different metrics with your queries to find what works best in practice.
If using Cosine with normalized vectors, switch to Dot Product for better performance.
Use the same metric for indexing and querying. Mixing metrics will produce incorrect results.

Collections

Learn how to configure distance metrics in collections

Vectors

Understand vector types and preprocessing

Indexing

Explore how HNSW indexes work with distance metrics

Points

Learn how points store vectors for comparison