Indexing - Qdrant

Indexing is crucial for fast vector similarity search at scale. Qdrant uses specialized index structures to efficiently search through millions or billions of vectors.

Vector Indexing

Qdrant supports two vector index types:

Plain Index (No Index)

Brute-force search through all vectors:

from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="small_collection",
    vectors_config=models.VectorParams(
        size=128,
        distance=models.Distance.COSINE
    ),
    # Plain index (default for new segments)
    hnsw_config=None
)

// From lib/segment/src/types.rs:618-628
pub enum Indexes {
    Plain {},  // No index, scan whole collection
    Hnsw(HnswConfig),  // HNSW approximate search
}

Plain indexes guarantee 100% precision but are only practical for small collections (less than 10K vectors).

HNSW Index

Hierarchical Navigable Small World - the default index for fast approximate search:

client.create_collection(
    collection_name="large_collection",
    vectors_config=models.VectorParams(
        size=384,
        distance=models.Distance.COSINE
    ),
    hnsw_config=models.HnswConfigDiff(
        m=16,              # Number of edges per node
        ef_construct=100,  # Construction time/quality trade-off
        full_scan_threshold=10000  # Switch to brute force below this
    )
)

// From lib/segment/src/types.rs:647-684
pub struct HnswConfig {
    pub m: usize,                      // Edges per node in graph
    pub ef_construct: usize,           // Build quality parameter
    pub full_scan_threshold: usize,    // KiloBytes threshold
    pub max_indexing_threads: usize,   // Parallel indexing threads
    pub on_disk: Option<bool>,         // Store index on disk
    pub payload_m: Option<usize>,      // M for payload indexes
    pub inline_storage: Option<bool>,  // Inline vectors in index
}

HNSW provides excellent recall (>95%) at a fraction of the cost of exhaustive search.

HNSW Algorithm

HNSW builds a multi-layer graph structure for efficient approximate nearest neighbor search:

How HNSW Works

Graph Construction: Vectors are organized into a hierarchical graph with multiple layers
Entry Point: Search starts at the top layer with a single entry point
Greedy Traversal: At each layer, navigate to the closest neighbor until a local minimum is found
Descend: Move down to the next layer and repeat
Bottom Layer: Final refinement at the bottom layer with all vectors

Key Parameters

M - Connectivity

Number of bidirectional links per node:

hnsw_config=models.HnswConfigDiff(
    m=16  # Default: 16
)

M Value	Memory	Build Time	Search Quality	Search Speed
8	Low	Fast	Good	Fast
16	Medium	Medium	Better	Medium
32	High	Slow	Best	Slow
64	Very High	Very Slow	Excellent	Very Slow

Higher M means more connections, better quality, but more memory and slower search.

ef_construct - Build Quality

Number of candidates evaluated during construction:

hnsw_config=models.HnswConfigDiff(
    ef_construct=100  # Default: 100
)

ef_construct	Build Time	Search Quality
50	Fast	Good
100	Medium	Better
200	Slow	Best
500	Very Slow	Excellent

Higher ef_construct improves index quality but increases build time. It does not affect search speed.

// From lib/segment/src/types.rs:1309
pub const DEFAULT_HNSW_EF_CONSTRUCT: usize = 100;

ef - Search Quality

Number of candidates evaluated during search (runtime parameter):

results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, 0.3, ...],
    limit=10,
    search_params=models.SearchParams(
        hnsw_ef=128,  # Default: automatic based on limit
        exact=False   # Use approximate search
    )
)

hnsw_ef	Search Speed	Recall
32	Fastest	~90%
64	Fast	~95%
128	Medium	~98%
256	Slow	~99%
512	Very Slow	~99.5%

Higher hnsw_ef improves recall but slows down search. Adjust per-query based on quality requirements.

full_scan_threshold

When to use brute-force instead of HNSW:

hnsw_config=models.HnswConfigDiff(
    full_scan_threshold=10000  # KiloBytes (KB)
)

// From lib/segment/src/types.rs:1731
pub const DEFAULT_FULL_SCAN_THRESHOLD: usize = 10_000;

For queries filtering down to few vectors, brute-force is faster than HNSW graph traversal. This threshold automatically switches strategies.

Index Storage

In-Memory HNSW

Fastest option - entire index in RAM:

hnsw_config=models.HnswConfigDiff(
    m=16,
    ef_construct=100,
    on_disk=False  # Store in RAM
)

On-Disk HNSW

Reduces memory usage for large indexes:

hnsw_config=models.HnswConfigDiff(
    m=16,
    ef_construct=100,
    on_disk=True  # Store on disk with memory-mapping
)

On-disk HNSW increases search latency, especially for cold queries. Use for large datasets where memory is limited.

Payload Indexing

Payload indexes enable fast filtering:

Keyword Index

For exact match filtering:

client.create_payload_index(
    collection_name="products",
    field_name="category",
    field_schema=models.PayloadSchemaType.KEYWORD
)

Uses a hash map for O(1) lookups:

category="electronics" -> [point_1, point_5, point_8, ...]
category="books" -> [point_2, point_3, point_7, ...]

Integer Index

For numeric exact match and range queries:

client.create_payload_index(
    collection_name="products",
    field_name="price",
    field_schema=models.IntegerIndexParams(
        type="integer",
        range=True,   # Enable range queries
        lookup=True   # Enable exact match
    )
)

Uses a range tree for efficient range queries:

price < 100 -> [point_1, point_4, point_6, ...]
price >= 500 -> [point_2, point_9, point_12, ...]

Text Index

For full-text search:

client.create_payload_index(
    collection_name="articles",
    field_name="content",
    field_schema=models.TextIndexParams(
        type="text",
        tokenizer="word",      # word, whitespace, multilingual
        min_token_len=2,
        max_token_len=20,
        lowercase=True
    )
)

Uses an inverted index:

"vector" -> [doc_1, doc_3, doc_7, ...]
"database" -> [doc_1, doc_2, doc_5, ...]
"search" -> [doc_3, doc_5, doc_8, ...]

Geo Index

For geographic queries:

client.create_payload_index(
    collection_name="locations",
    field_name="coordinates",
    field_schema=models.PayloadSchemaType.GEO
)

Uses a spatial index (R-tree) for radius and bounding box queries.

Bool Index

For boolean filtering:

client.create_payload_index(
    collection_name="products",
    field_name="in_stock",
    field_schema=models.PayloadSchemaType.BOOL
)

Optimization Process

Qdrant automatically optimizes segments over time:

Segment Types

// From lib/segment/src/types.rs:417-426
pub enum SegmentType {
    Plain,    // No index, all operations available
    Indexed,  // With index, optimized for search
    Special,  // Special purpose segments
}

Optimization Strategy

Plain Segments: New points go into plain (unindexed) segments for fast writes
Threshold: When segment reaches threshold size, optimization is triggered
Index Building: Optimizer builds HNSW index in background
Replacement: Old plain segment is replaced with new indexed segment

# Configure optimization
client.update_collection(
    collection_name="my_collection",
    optimizer_config=models.OptimizersConfigDiff(
        indexing_threshold=20000,  # Start indexing at 20K points
        max_optimization_threads=4
    )
)

Optimization runs in the background without blocking reads or writes.

Quantization

Reduce memory usage with vector quantization:

Scalar Quantization

Convert float32 to int8:

client.create_collection(
    collection_name="compressed",
    vectors_config=models.VectorParams(
        size=768,
        distance=models.Distance.COSINE
    ),
    quantization_config=models.ScalarQuantization(
        scalar=models.ScalarQuantizationConfig(
            type=models.ScalarType.INT8,
            quantile=0.99,      # Quantization bounds
            always_ram=True     # Keep quantized vectors in RAM
        )
    )
)

Memory reduction: 4x (from float32 to int8)

Product Quantization

Split vectors into sub-vectors:

quantization_config=models.ProductQuantization(
    product=models.ProductQuantizationConfig(
        compression=models.CompressionRatio.X16,  # 16x compression
        always_ram=False
    )
)

Memory reduction: 4x to 64x depending on compression ratio

Binary Quantization

1-bit quantization:

quantization_config=models.BinaryQuantization(
    binary=models.BinaryQuantizationConfig(
        always_ram=True
    )
)

Memory reduction: 32x (from float32 to 1-bit)

Quantization trades memory for accuracy. Always benchmark with your specific data and queries.

Search Optimization

ACORN Search

Improves recall for filtered searches:

results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, ...],
    query_filter=models.Filter(...),
    search_params=models.SearchParams(
        acorn=models.AcornSearchParams(
            enable=True,
            max_selectivity=0.4  # Use ACORN when filters match <40% of points
        )
    ),
    limit=10
)

ACORN helps when filters are very selective (match few points). It improves recall at the cost of some performance.

Exact Search

Disable approximate search:

results = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, ...],
    search_params=models.SearchParams(
        exact=True  # Use brute-force for 100% recall
    ),
    limit=10
)

Performance Tuning

For High Throughput

hnsw_config=models.HnswConfigDiff(
    m=8,              # Lower connectivity
    ef_construct=64,  # Faster build
    on_disk=False
)

search_params=models.SearchParams(
    hnsw_ef=32  # Faster search, ~90% recall
)

For High Accuracy

hnsw_config=models.HnswConfigDiff(
    m=32,              # Higher connectivity
    ef_construct=200,  # Better quality
    on_disk=False
)

search_params=models.SearchParams(
    hnsw_ef=256  # Better recall, ~99%
)

For Large Scale

hnsw_config=models.HnswConfigDiff(
    m=16,
    ef_construct=100,
    on_disk=True  # Save memory
)

quantization_config=models.ScalarQuantization(
    scalar=models.ScalarQuantizationConfig(
        type=models.ScalarType.INT8,
        quantile=0.99
    )
)

Best Practices

Index all filtered fields

Create indexes for every payload field you filter on. Unindexed filters are extremely slow.

Balance M and ef_construct

Start with defaults (M=16, ef_construct=100). Increase M for better recall, ef_construct for better index quality.

Tune hnsw_ef per use case

Use lower hnsw_ef (32-64) for latency-critical applications, higher (128-256) for accuracy-critical applications.

Use quantization for large collections

Quantization dramatically reduces memory usage with minimal accuracy loss. Start with scalar quantization.

Monitor optimization status

Check collection status to ensure optimization is completing successfully.

Consider on-disk for large datasets

If your vectors don’t fit in RAM, use on-disk HNSW and quantization.

Collections

Learn how to configure indexes at collection creation

Vectors

Understand what gets indexed

Payloads

Learn about payload indexes

Distance Metrics

Understand distance calculations in indexed search

Documentation Index

​Vector Indexing

​Plain Index (No Index)

​HNSW Index

​HNSW Algorithm

​How HNSW Works

​Key Parameters

​M - Connectivity

​ef_construct - Build Quality

​ef - Search Quality

​full_scan_threshold

​Index Storage

​In-Memory HNSW

​On-Disk HNSW

​Payload Indexing

​Keyword Index

​Integer Index

​Text Index

​Geo Index

​Bool Index

​Optimization Process

​Segment Types

​Optimization Strategy

​Quantization

​Scalar Quantization

​Product Quantization

​Binary Quantization

​Search Optimization

​ACORN Search

​Exact Search

​Performance Tuning

​For High Throughput

​For High Accuracy

​For Large Scale

​Best Practices

​Related Concepts

Collections

Vectors

Payloads

Distance Metrics

Vector Indexing

Plain Index (No Index)

HNSW Index

HNSW Algorithm

How HNSW Works

Key Parameters

M - Connectivity

ef_construct - Build Quality

ef - Search Quality

full_scan_threshold

Index Storage

In-Memory HNSW

On-Disk HNSW

Payload Indexing

Keyword Index

Integer Index

Text Index

Geo Index

Bool Index

Optimization Process

Segment Types

Optimization Strategy

Quantization

Scalar Quantization

Product Quantization

Binary Quantization

Search Optimization

ACORN Search

Exact Search

Performance Tuning

For High Throughput

For High Accuracy

For Large Scale

Best Practices

Related Concepts