Indexing Parameters
HNSW Index Configuration
HNSW (Hierarchical Navigable Small World) is the primary index type in Qdrant.config/config.yaml
Parameter Guidelines
m (edges per node)
- Higher values → better search accuracy, more memory
- Lower values → faster indexing, less memory
- Recommended: 16-32 for most use cases
- 64+ for very high accuracy requirements
ef_construct (construction neighbors)
- Higher values → better index quality, slower indexing
- Lower values → faster indexing, lower accuracy
- Recommended: 100-200 for balanced performance
- 400+ for maximum accuracy
full_scan_threshold_kb
- Below this size, use full scan instead of HNSW
- Note: 1 KB ≈ 1 vector of size 256
- Default: 10000 (suitable for most cases)
Per-Collection Index Settings
Override global settings per collection:Indexing Threshold
Control when vectors are indexed:config/config.yaml
- Set to
0to disable indexing (use for small collections) - Higher values delay indexing until more data is collected
- Note: 1 KB = 1 vector of size 256
On-Disk Index
Store HNSW index on disk to save memory:config/config.yaml
On-disk indexing reduces RAM usage but may increase query latency due to disk I/O.
Search Threads
Control parallelism for search operations.Maximum Search Threads
config/config.yaml
- 0 - Automatic (recommended):
max(1, CPU_count - 1) - Positive number - Use exactly this many threads
- Higher values - Better search throughput, more CPU usage
Service Workers
Control API request handling parallelism:config/config.yaml
- Concurrent request handling
- REST API parallelism
- gRPC stream processing
Optimizer CPU Budget
Control resources allocated to background optimization.config/config.yaml
- 0 (default) - Auto-select, reserve 1+ CPUs
- Positive - Use exactly this many CPUs
- Negative - Subtract from available CPUs (e.g., -2 = total_cpus - 2)
Optimization Threads
Control concurrent optimization tasks:config/config.yaml
- null - Dynamic, saturate available CPU
- 0 - Disable optimizations
- Positive - Limit concurrent optimization jobs
max_indexing_threads for index building.
Indexing Threads
config/config.yaml
- Recommended: 8-16 threads
- Too many threads may create inefficient HNSW graphs
- On small CPUs, fewer threads are used automatically
Memory Settings
On-Disk Payload
Reduce memory usage by storing payloads on disk:config/config.yaml
- Payloads read from disk on each request
- Saves RAM
- Slightly increases response time
- Indexed payload fields remain in RAM
On-Disk Vectors
Store vectors on disk:config/config.yaml
Segment Size Limits
Control segment size to balance performance:config/config.yaml
- Smaller segments → faster indexing, more segments
- Larger segments → better search speed, slower indexing
- Note: 1 KB = 1 vector of size 256
Default Segment Number
config/config.yaml
max_search_threads for even distribution.
Async Scorer
Enable high-performance async I/O for rescoring (Linux only).config/config.yaml
- Linux kernel with
io_uringsupport - Significantly improves performance for on-disk vectors
- Must be enabled at kernel level
Async scorer uses
io_uring for efficient async disk I/O. See Qdrant io_uring article for details.Write Performance
Update Concurrency
config/config.yaml
Update Rate Limiting
config/config.yaml
Flush Interval
config/config.yaml
- Lower values → better durability, more I/O
- Higher values → better write performance, risk of data loss
WAL Configuration
config/config.yaml
- wal_capacity_mb - Size of each WAL segment
- wal_segments_ahead - Pre-allocate segments for faster writes
Collection Loading
Control concurrent collection loading:config/config.yaml
Optimization Tuning
Deleted Threshold
Trigger optimization when enough vectors are deleted:config/config.yaml
Vacuum Minimum Vectors
config/config.yaml
Override Optimizers
Force optimizer settings across all collections:config/config.yaml
Quantization
Reduce memory usage and improve performance with quantization:- Scalar (int8) - 4x memory reduction
- Product - Higher compression ratios
- Binary - Maximum compression for specific use cases
Replication Settings
config/config.yaml
- replication_factor - Number of shard copies
- write_consistency_factor - Replicas confirming writes
Request Size Limits
config/config.yaml
Workload-Specific Tuning
High-Throughput Search
Large-Scale Indexing
Memory-Constrained Environment
Balanced Configuration
Monitoring Performance
Key Metrics
Monitor these metrics for performance insights:rest_responses_duration_seconds- Query latencycollection_running_optimizations- Active optimization tasksmemory_allocated_bytes- Memory usageprocess_threads- Thread countcollection_update_queue_length- Write backlog
Profiling
Enable profiling for detailed performance analysis:config/config.yaml
/debug/pprof/ endpoints.
Best Practices
Start with Defaults
Begin with default settings and tune based on observed performance bottlenecks.
Benchmark Your Data
Test different configurations with your actual vectors and query patterns.
Monitor Before Tuning
Use metrics to identify bottlenecks before making configuration changes.
Tune Incrementally
Change one parameter at a time and measure impact before further adjustments.
Troubleshooting
High CPU Usage
- Reduce
optimizer_cpu_budget - Increase
flush_interval_sec - Lower
max_optimization_threads - Check for excessive concurrent requests
High Memory Usage
- Enable
on_disk_payload: true - Set
hnsw_index.on_disk: true - Enable vector quantization
- Reduce
default_segment_number
Slow Queries
- Increase
max_search_threads - Tune HNSW parameters (
m,ef_construct) - Disable on-disk storage for hot collections
- Check if indexes are built (not in optimization)
Slow Indexing
- Increase
optimizer_cpu_budget - Raise
max_indexing_threads - Lower
ef_constructfor faster building - Increase
max_segment_size_kb