Skip to main content
Indexing is crucial for fast vector similarity search at scale. Qdrant uses specialized index structures to efficiently search through millions or billions of vectors.

Vector Indexing

Qdrant supports two vector index types:

Plain Index (No Index)

Brute-force search through all vectors:
from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="small_collection",
    vectors_config=models.VectorParams(
        size=128,
        distance=models.Distance.COSINE
    ),
    # Plain index (default for new segments)
    hnsw_config=None
)
// From lib/segment/src/types.rs:618-628
pub enum Indexes {
    Plain {},  // No index, scan whole collection
    Hnsw(HnswConfig),  // HNSW approximate search
}
Plain indexes guarantee 100% precision but are only practical for small collections (less than 10K vectors).

HNSW Index

Hierarchical Navigable Small World - the default index for fast approximate search:
client.create_collection(
    collection_name="large_collection",
    vectors_config=models.VectorParams(
        size=384,
        distance=models.Distance.COSINE
    ),
    hnsw_config=models.HnswConfigDiff(
        m=16,              # Number of edges per node
        ef_construct=100,  # Construction time/quality trade-off
        full_scan_threshold=10000  # Switch to brute force below this
    )
)
// From lib/segment/src/types.rs:647-684
pub struct HnswConfig {
    pub m: usize,                      // Edges per node in graph
    pub ef_construct: usize,           // Build quality parameter
    pub full_scan_threshold: usize,    // KiloBytes threshold
    pub max_indexing_threads: usize,   // Parallel indexing threads
    pub on_disk: Option<bool>,         // Store index on disk
    pub payload_m: Option<usize>,      // M for payload indexes
    pub inline_storage: Option<bool>,  // Inline vectors in index
}
HNSW provides excellent recall (>95%) at a fraction of the cost of exhaustive search.

HNSW Algorithm

HNSW builds a multi-layer graph structure for efficient approximate nearest neighbor search:

How HNSW Works

  1. Graph Construction: Vectors are organized into a hierarchical graph with multiple layers
  2. Entry Point: Search starts at the top layer with a single entry point
  3. Greedy Traversal: At each layer, navigate to the closest neighbor until a local minimum is found
  4. Descend: Move down to the next layer and repeat
  5. Bottom Layer: Final refinement at the bottom layer with all vectors

Key Parameters

M - Connectivity

Number of bidirectional links per node:
hnsw_config=models.HnswConfigDiff(
    m=16  # Default: 16
)
M ValueMemoryBuild TimeSearch QualitySearch Speed
8LowFastGoodFast
16MediumMediumBetterMedium
32HighSlowBestSlow
64Very HighVery SlowExcellentVery Slow
Higher M means more connections, better quality, but more memory and slower search.

ef_construct - Build Quality

Number of candidates evaluated during construction:
hnsw_config=models.HnswConfigDiff(
    ef_construct=100  # Default: 100
)
ef_constructBuild TimeSearch Quality
50FastGood
100MediumBetter
200SlowBest
500Very SlowExcellent
Higher ef_construct improves index quality but increases build time. It does not affect search speed.
// From lib/segment/src/types.rs:1309
pub const DEFAULT_HNSW_EF_CONSTRUCT: usize = 100;

ef - Search Quality

Number of candidates evaluated during search (runtime parameter):
results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, 0.3, ...],
    limit=10,
    search_params=models.SearchParams(
        hnsw_ef=128,  # Default: automatic based on limit
        exact=False   # Use approximate search
    )
)
hnsw_efSearch SpeedRecall
32Fastest~90%
64Fast~95%
128Medium~98%
256Slow~99%
512Very Slow~99.5%
Higher hnsw_ef improves recall but slows down search. Adjust per-query based on quality requirements.

full_scan_threshold

When to use brute-force instead of HNSW:
hnsw_config=models.HnswConfigDiff(
    full_scan_threshold=10000  # KiloBytes (KB)
)
// From lib/segment/src/types.rs:1731
pub const DEFAULT_FULL_SCAN_THRESHOLD: usize = 10_000;
For queries filtering down to few vectors, brute-force is faster than HNSW graph traversal. This threshold automatically switches strategies.

Index Storage

In-Memory HNSW

Fastest option - entire index in RAM:
hnsw_config=models.HnswConfigDiff(
    m=16,
    ef_construct=100,
    on_disk=False  # Store in RAM
)

On-Disk HNSW

Reduces memory usage for large indexes:
hnsw_config=models.HnswConfigDiff(
    m=16,
    ef_construct=100,
    on_disk=True  # Store on disk with memory-mapping
)
On-disk HNSW increases search latency, especially for cold queries. Use for large datasets where memory is limited.

Payload Indexing

Payload indexes enable fast filtering:

Keyword Index

For exact match filtering:
client.create_payload_index(
    collection_name="products",
    field_name="category",
    field_schema=models.PayloadSchemaType.KEYWORD
)
Uses a hash map for O(1) lookups:
category="electronics" -> [point_1, point_5, point_8, ...]
category="books" -> [point_2, point_3, point_7, ...]

Integer Index

For numeric exact match and range queries:
client.create_payload_index(
    collection_name="products",
    field_name="price",
    field_schema=models.IntegerIndexParams(
        type="integer",
        range=True,   # Enable range queries
        lookup=True   # Enable exact match
    )
)
Uses a range tree for efficient range queries:
price < 100 -> [point_1, point_4, point_6, ...]
price >= 500 -> [point_2, point_9, point_12, ...]

Text Index

For full-text search:
client.create_payload_index(
    collection_name="articles",
    field_name="content",
    field_schema=models.TextIndexParams(
        type="text",
        tokenizer="word",      # word, whitespace, multilingual
        min_token_len=2,
        max_token_len=20,
        lowercase=True
    )
)
Uses an inverted index:
"vector" -> [doc_1, doc_3, doc_7, ...]
"database" -> [doc_1, doc_2, doc_5, ...]
"search" -> [doc_3, doc_5, doc_8, ...]

Geo Index

For geographic queries:
client.create_payload_index(
    collection_name="locations",
    field_name="coordinates",
    field_schema=models.PayloadSchemaType.GEO
)
Uses a spatial index (R-tree) for radius and bounding box queries.

Bool Index

For boolean filtering:
client.create_payload_index(
    collection_name="products",
    field_name="in_stock",
    field_schema=models.PayloadSchemaType.BOOL
)

Optimization Process

Qdrant automatically optimizes segments over time:

Segment Types

// From lib/segment/src/types.rs:417-426
pub enum SegmentType {
    Plain,    // No index, all operations available
    Indexed,  // With index, optimized for search
    Special,  // Special purpose segments
}

Optimization Strategy

  1. Plain Segments: New points go into plain (unindexed) segments for fast writes
  2. Threshold: When segment reaches threshold size, optimization is triggered
  3. Index Building: Optimizer builds HNSW index in background
  4. Replacement: Old plain segment is replaced with new indexed segment
# Configure optimization
client.update_collection(
    collection_name="my_collection",
    optimizer_config=models.OptimizersConfigDiff(
        indexing_threshold=20000,  # Start indexing at 20K points
        max_optimization_threads=4
    )
)
Optimization runs in the background without blocking reads or writes.

Quantization

Reduce memory usage with vector quantization:

Scalar Quantization

Convert float32 to int8:
client.create_collection(
    collection_name="compressed",
    vectors_config=models.VectorParams(
        size=768,
        distance=models.Distance.COSINE
    ),
    quantization_config=models.ScalarQuantization(
        scalar=models.ScalarQuantizationConfig(
            type=models.ScalarType.INT8,
            quantile=0.99,      # Quantization bounds
            always_ram=True     # Keep quantized vectors in RAM
        )
    )
)
Memory reduction: 4x (from float32 to int8)

Product Quantization

Split vectors into sub-vectors:
quantization_config=models.ProductQuantization(
    product=models.ProductQuantizationConfig(
        compression=models.CompressionRatio.X16,  # 16x compression
        always_ram=False
    )
)
Memory reduction: 4x to 64x depending on compression ratio

Binary Quantization

1-bit quantization:
quantization_config=models.BinaryQuantization(
    binary=models.BinaryQuantizationConfig(
        always_ram=True
    )
)
Memory reduction: 32x (from float32 to 1-bit)
Quantization trades memory for accuracy. Always benchmark with your specific data and queries.

Search Optimization

Improves recall for filtered searches:
results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, ...],
    query_filter=models.Filter(...),
    search_params=models.SearchParams(
        acorn=models.AcornSearchParams(
            enable=True,
            max_selectivity=0.4  # Use ACORN when filters match <40% of points
        )
    ),
    limit=10
)
ACORN helps when filters are very selective (match few points). It improves recall at the cost of some performance.
Disable approximate search:
results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, ...],
    search_params=models.SearchParams(
        exact=True  # Use brute-force for 100% recall
    ),
    limit=10
)

Performance Tuning

For High Throughput

hnsw_config=models.HnswConfigDiff(
    m=8,              # Lower connectivity
    ef_construct=64,  # Faster build
    on_disk=False
)

search_params=models.SearchParams(
    hnsw_ef=32  # Faster search, ~90% recall
)

For High Accuracy

hnsw_config=models.HnswConfigDiff(
    m=32,              # Higher connectivity
    ef_construct=200,  # Better quality
    on_disk=False
)

search_params=models.SearchParams(
    hnsw_ef=256  # Better recall, ~99%
)

For Large Scale

hnsw_config=models.HnswConfigDiff(
    m=16,
    ef_construct=100,
    on_disk=True  # Save memory
)

quantization_config=models.ScalarQuantization(
    scalar=models.ScalarQuantizationConfig(
        type=models.ScalarType.INT8,
        quantile=0.99
    )
)

Best Practices

Create indexes for every payload field you filter on. Unindexed filters are extremely slow.
Start with defaults (M=16, ef_construct=100). Increase M for better recall, ef_construct for better index quality.
Use lower hnsw_ef (32-64) for latency-critical applications, higher (128-256) for accuracy-critical applications.
Quantization dramatically reduces memory usage with minimal accuracy loss. Start with scalar quantization.
Check collection status to ensure optimization is completing successfully.
If your vectors don’t fit in RAM, use on-disk HNSW and quantization.

Collections

Learn how to configure indexes at collection creation

Vectors

Understand what gets indexed

Payloads

Learn about payload indexes

Distance Metrics

Understand distance calculations in indexed search