Hybrid Search - Qdrant

Hybrid search combines the strengths of two complementary search approaches: dense vector search for semantic understanding and sparse vector search for precise keyword matching. This technique addresses the limitations of using either approach alone.

Why Hybrid Search?

Dense embeddings excel at capturing semantic meaning but may struggle with:

Exact keyword matches
Proper nouns and domain-specific terminology
Rare or out-of-vocabulary terms

Sparse vectors (similar to BM25/TF-IDF) excel at:

Precise keyword matching
Token-level relevance
Handling rare terms

Combining both approaches provides the best of both worlds.

How It Works

Qdrant allows you to store multiple named vectors per point. You can combine:

Dense vectors - traditional embeddings (e.g., from sentence transformers)
Sparse vectors - token-based representations with weighted indices

Vector Configuration

Create a collection with both dense and sparse vectors:

PUT /collections/hybrid_collection
{
  "vectors": {
    "dense": {
      "size": 768,
      "distance": "Cosine"
    },
    "sparse": {
      "modifier": "idf"
    }
  }
}

Inserting Data

Store both vector types for each point:

PUT /collections/hybrid_collection/points
{
  "points": [
    {
      "id": 1,
      "vector": {
        "dense": [0.1, 0.2, 0.3, ...],
        "sparse": {
          "indices": [15, 42, 156, 2048],
          "values": [0.5, 1.2, 0.8, 0.3]
        }
      },
      "payload": {"text": "Your document text"}
    }
  ]
}

Querying with Hybrid Search

Using Prefetch for Fusion

The recommended approach is to use the prefetch API with fusion:

POST /collections/hybrid_collection/points/query
{
  "prefetch": [
    {
      "query": [0.1, 0.2, 0.3, ...],
      "using": "dense",
      "limit": 20
    },
    {
      "query": {
        "indices": [15, 42, 156],
        "values": [0.5, 1.2, 0.8]
      },
      "using": "sparse",
      "limit": 20
    }
  ],
  "query": {"fusion": "rrf"},
  "limit": 10
}

Fusion Strategies

RRF (Reciprocal Rank Fusion)

The default and most commonly used fusion method. It combines rankings from multiple searches using reciprocal ranks:

score = Σ(1 / (k + rank_i))

Where k is a constant (typically 60) and rank_i is the position in each result list.

Distribution-Based Score Fusion (DBSF)

Normalizes scores from different searches based on their statistical distribution before combining them:

{
  "query": {"fusion": "dbsf"}
}

Scoring Strategies

Separate Searches with Manual Combination

You can also perform separate searches and combine results in your application:

from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)

# Dense search
dense_results = client.query_points(
    collection_name="hybrid_collection",
    query=dense_embedding,
    using="dense",
    limit=20
)

# Sparse search
sparse_results = client.query_points(
    collection_name="hybrid_collection",
    query=sparse_vector,
    using="sparse",
    limit=20
)

# Combine results with custom logic
combined = combine_results(dense_results, sparse_results)

Use Cases

E-commerce Search

Combine semantic understanding of product descriptions with exact SKU and brand name matching.

Legal Document Search

Find documents by meaning while ensuring specific legal terms and citations are matched.

Code Search

Search by semantic functionality while matching exact function names and identifiers.

Academic Papers

Semantic search for concepts combined with citation and author name matching.

Best Practices

Retrieval Limits

Retrieve more candidates in prefetch (e.g., 20-100) than your final limit (e.g., 10) to ensure fusion has sufficient data to work with.

Vector Quality

Ensure both dense and sparse vectors are generated from the same text to maintain consistency.

Testing Fusion Methods

Test both RRF and DBSF with your specific dataset - performance varies by use case.

Monitoring

Track which vector type contributes more to final results to optimize your approach.

Performance Considerations

Hybrid search requires two vector lookups, which increases query latency
Use HNSW indexing for dense vectors to maintain fast search
Sparse vector search uses inverted index, which is generally fast for high-dimensional sparse data
Consider using filters in prefetch to reduce candidate sets before fusion

Sparse Vectors - Deep dive into sparse vector representation
Quantization - Reduce memory usage for large vector collections

Documentation Index

​Why Hybrid Search?

​How It Works

​Vector Configuration

​Inserting Data

​Querying with Hybrid Search

​Using Prefetch for Fusion

​Fusion Strategies

​Scoring Strategies

​Separate Searches with Manual Combination

​Use Cases