Horizontal Scaling
Qdrant supports horizontal scaling through distributed deployment with multiple nodes and sharding.Cluster Architecture
A Qdrant cluster consists of:- Peers - Individual Qdrant nodes in the cluster
- Shards - Partitions of collection data distributed across peers
- Replicas - Copies of shards for high availability
- Consensus - Raft-based consensus for cluster coordination
Enabling Cluster Mode
Configure cluster mode inconfig/config.yaml:
config/config.yaml
Peer ID Assignment
Each node needs a unique peer ID:config/config.yaml
Adding Nodes to Cluster
Bootstrapping a Cluster
Start the first node with bootstrap flag:Adding Additional Nodes
Start subsequent nodes and point them to an existing peer:- Connect to the existing cluster
- Sync cluster metadata
- Join the consensus group
- Become available for shard placement
Verifying Cluster State
Shard Management
Shard Distribution
Configure default shard distribution for collections:config/config.yaml
- replication_factor - Number of shard replicas
- write_consistency_factor - Replicas confirming writes
Creating Collections with Sharding
- REST API
- Python
Viewing Shard Distribution
Shard Migration
Move shards between nodes to balance load or scale up/down.Initiating Shard Transfer
Shard Transfer Methods
Qdrant supports three transfer methods:- stream_records - Stream points one-by-one
- snapshot - Transfer via snapshot file
- wal_delta - Transfer WAL deltas (incremental)
config/config.yaml
Transfer Rate Limits
Control concurrent transfers:config/config.yaml
Monitoring Shard Transfers
View active transfers:collection_shard_transfer_incoming- Incoming transferscollection_shard_transfer_outgoing- Outgoing transfers
Aborting Shard Transfers
Rebalancing
Automatically distribute shards evenly across nodes.Manual Rebalancing
Initiate cluster-wide rebalancing:Scaling Strategies
Scaling Up (Adding Nodes)
- Add new node to cluster
- Create new shard replicas on the new node:
- Remove old replicas if desired
Scaling Down (Removing Nodes)
- Move shards away from the node:
- Wait for all transfers to complete
- Verify node has no active shards
- Remove node from cluster
Resharding
Change the number of shards in a collection:config/config.yaml
Zero-Downtime Updates
Rolling Updates
Update cluster nodes one at a time:- Prepare - Ensure cluster is healthy and replicated
- Drain - Stop sending traffic to one node
- Update - Upgrade the Qdrant binary
- Restart - Start the node with new version
- Verify - Check node rejoined cluster successfully
- Repeat - Move to next node
Update Procedure
Health Checks During Updates
Use readiness checks:200 when node is:
- Synced with cluster consensus
- All local shards are healthy
- Ready to serve traffic
Consensus Stability
Ensure consensus remains stable:config/config.yaml
Load Balancing
Client-Side Load Balancing
Distribute requests across nodes:Proxy Load Balancing
Use HAProxy, NGINX, or cloud load balancers:nginx.conf
Performance Considerations
Shard Sizing
- Optimal shard size: 10M-100M points per shard
- Too many shards: Increased overhead, memory usage
- Too few shards: Limited parallelism, uneven distribution
Replication Trade-offs
- Higher replication: Better availability, more storage
- Lower replication: Less redundancy, faster writes
Network Configuration
Optimize cluster networking:config/config.yaml
Best Practices
Plan Shard Count
Calculate shard count based on data volume and node count. Aim for balanced distribution.
Use Replication
Always use replication_factor >= 3 in production for high availability.
Monitor Transfers
Watch shard transfer metrics and logs during migrations to catch issues early.
Test Failover
Regularly test node failures and automatic recovery to validate cluster resilience.
Troubleshooting
Node Cannot Join Cluster
Check:- Network connectivity between peers
- Peer ID is unique and in valid range
- Bootstrap node is reachable
- Firewall allows port 6335 (p2p)
Slow Shard Transfers
Optimize:- Use
wal_deltamethod for incremental transfers - Increase network bandwidth
- Check
outgoing_shard_transfers_limit
Split Brain Scenario
If consensus is lost:- Identify the partition with majority of nodes
- Restart minority nodes to rejoin
- Check network stability
- Review consensus configuration
Uneven Shard Distribution
Rebalance manually:- Identify overloaded nodes
- Move/replicate shards to underutilized nodes
- Consider resharding for long-term solution