Scaling - Qdrant

Horizontal Scaling

Qdrant supports horizontal scaling through distributed deployment with multiple nodes and sharding.

Cluster Architecture

A Qdrant cluster consists of:

Peers - Individual Qdrant nodes in the cluster
Shards - Partitions of collection data distributed across peers
Replicas - Copies of shards for high availability
Consensus - Raft-based consensus for cluster coordination

Enabling Cluster Mode

Configure cluster mode in config/config.yaml:

config/config.yaml

cluster:
  enabled: true
  
  # P2P communication
  p2p:
    port: 6335
    enable_tls: false
    
  # Consensus configuration
  consensus:
    tick_period_ms: 100
    compact_wal_entries: 128

Peer ID Assignment

Each node needs a unique peer ID:

config/config.yaml

cluster:
  peer_id: 1  # Must be unique, range: 1 to 9007199254740991

Or use environment variable:

QDRANT__CLUSTER__PEER_ID=1 ./qdrant

Adding Nodes to Cluster

Bootstrapping a Cluster

Start the first node with bootstrap flag:

./qdrant --bootstrap

This initializes the cluster with a single node.

Adding Additional Nodes

Start subsequent nodes and point them to an existing peer:

./qdrant --uri http://existing-peer:6335

The new node will:

Connect to the existing cluster
Sync cluster metadata
Join the consensus group
Become available for shard placement

Verifying Cluster State

curl http://localhost:6333/cluster

Response:

{
  "result": {
    "status": "enabled",
    "peer_id": 1,
    "peers": {
      "1": {"uri": "http://node1:6335"},
      "2": {"uri": "http://node2:6335"},
      "3": {"uri": "http://node3:6335"}
    },
    "raft_info": {
      "term": 5,
      "commit": 123,
      "pending_operations": 0,
      "is_voter": true
    }
  },
  "status": "ok"
}

Shard Management

Shard Distribution

Configure default shard distribution for collections:

config/config.yaml

storage:
  collection:
    replication_factor: 3
    write_consistency_factor: 2

replication_factor - Number of shard replicas
write_consistency_factor - Replicas confirming writes

Creating Collections with Sharding

REST API
Python

curl -X PUT http://localhost:6333/collections/my_collection \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    },
    "shard_number": 4,
    "replication_factor": 3
  }'

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="my_collection",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE),
    shard_number=4,
    replication_factor=3
)

Viewing Shard Distribution

curl http://localhost:6333/collections/my_collection/cluster

Response shows shard placement across peers:

{
  "result": {
    "peer_id": 1,
    "shard_count": 4,
    "local_shards": [
      {
        "shard_id": 0,
        "points_count": 100000,
        "state": "Active"
      }
    ],
    "remote_shards": [
      {
        "shard_id": 1,
        "peer_id": 2,
        "state": "Active"
      }
    ],
    "shard_transfers": []
  }
}

Shard Migration

Move shards between nodes to balance load or scale up/down.

Initiating Shard Transfer

curl -X POST http://localhost:6333/collections/my_collection/cluster \
  -H 'Content-Type: application/json' \
  -d '{
    "move_shard": {
      "shard_id": 0,
      "from_peer_id": 1,
      "to_peer_id": 3
    }
  }'

Shard Transfer Methods

Qdrant supports three transfer methods:

stream_records - Stream points one-by-one
snapshot - Transfer via snapshot file
wal_delta - Transfer WAL deltas (incremental)

Configure default method:

config/config.yaml

storage:
  shard_transfer_method: wal_delta  # or "stream_records", "snapshot"

Specify per transfer:

{
  "move_shard": {
    "shard_id": 0,
    "from_peer_id": 1,
    "to_peer_id": 3,
    "method": "snapshot"
  }
}

Transfer Rate Limits

Control concurrent transfers:

config/config.yaml

storage:
  performance:
    incoming_shard_transfers_limit: 1
    outgoing_shard_transfers_limit: 1

Monitoring Shard Transfers

View active transfers:

curl http://localhost:6333/collections/my_collection/cluster

Check transfer metrics:

curl http://localhost:6333/metrics | grep shard_transfer

Metrics:

collection_shard_transfer_incoming - Incoming transfers
collection_shard_transfer_outgoing - Outgoing transfers

Aborting Shard Transfers

curl -X POST http://localhost:6333/collections/my_collection/cluster \
  -H 'Content-Type: application/json' \
  -d '{
    "abort_transfer": {
      "shard_id": 0,
      "from_peer_id": 1,
      "to_peer_id": 3
    }
  }'

Rebalancing

Automatically distribute shards evenly across nodes.

Manual Rebalancing

Initiate cluster-wide rebalancing:

curl -X POST http://localhost:6333/collections/my_collection/cluster \
  -H 'Content-Type: application/json' \
  -d '{
    "replicate_shard": {
      "shard_id": 0,
      "from_peer_id": 1,
      "to_peer_id": 3
    }
  }'

Scaling Strategies

Scaling Up (Adding Nodes)

Add new node to cluster

Create new shard replicas on the new node:

curl -X POST http://localhost:6333/collections/my_collection/cluster \
  -H 'Content-Type: application/json' \
  -d '{
    "replicate_shard": {
      "shard_id": 0,
      "from_peer_id": 1,
      "to_peer_id": 4
    }
  }'

Remove old replicas if desired

Scaling Down (Removing Nodes)

Move shards away from the node:

curl -X POST http://localhost:6333/collections/my_collection/cluster \
  -H 'Content-Type: application/json' \
  -d '{
    "move_shard": {
      "shard_id": 0,
      "from_peer_id": 4,
      "to_peer_id": 1
    }
  }'

Wait for all transfers to complete
Verify node has no active shards
Remove node from cluster

Resharding

Change the number of shards in a collection:

config/config.yaml

cluster:
  resharding_enabled: true

This is an experimental feature for dynamically adjusting shard counts.

Zero-Downtime Updates

Rolling Updates

Update cluster nodes one at a time:

Prepare - Ensure cluster is healthy and replicated
Drain - Stop sending traffic to one node
Update - Upgrade the Qdrant binary
Restart - Start the node with new version
Verify - Check node rejoined cluster successfully
Repeat - Move to next node

Update Procedure

# 1. Check cluster health
curl http://localhost:6333/cluster

# 2. Update node (example for node 2)
ssh node2 "systemctl stop qdrant"
ssh node2 "wget https://github.com/qdrant/qdrant/releases/download/v1.x.x/qdrant"
ssh node2 "systemctl start qdrant"

# 3. Verify node rejoined
curl http://localhost:6333/cluster

# 4. Wait for sync
curl http://localhost:6333/readyz

# Repeat for remaining nodes

Health Checks During Updates

Use readiness checks:

curl http://node:6333/readyz

Returns 200 when node is:

Synced with cluster consensus
All local shards are healthy
Ready to serve traffic

Consensus Stability

Ensure consensus remains stable:

config/config.yaml

cluster:
  consensus:
    tick_period_ms: 100
    bootstrap_timeout_sec: 15

Monitor consensus metrics:

curl http://localhost:6333/metrics | grep cluster

Load Balancing

Client-Side Load Balancing

Distribute requests across nodes:

from qdrant_client import QdrantClient

nodes = [
    "http://node1:6333",
    "http://node2:6333",
    "http://node3:6333"
]

# Round-robin or random selection
client = QdrantClient(url=nodes[0])

Proxy Load Balancing

Use HAProxy, NGINX, or cloud load balancers:

nginx.conf

upstream qdrant_cluster {
    least_conn;
    server node1:6333;
    server node2:6333;
    server node3:6333;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://qdrant_cluster;
        proxy_http_version 1.1;
    }
    
    location /readyz {
        proxy_pass http://qdrant_cluster;
        health_check;
    }
}

Performance Considerations

Shard Sizing

Optimal shard size: 10M-100M points per shard
Too many shards: Increased overhead, memory usage
Too few shards: Limited parallelism, uneven distribution

Replication Trade-offs

Higher replication: Better availability, more storage
Lower replication: Less redundancy, faster writes

Network Configuration

Optimize cluster networking:

config/config.yaml

cluster:
  grpc_timeout_ms: 5000
  connection_timeout_ms: 1000
  p2p:
    connection_pool_size: 16

Best Practices

Plan Shard Count

Calculate shard count based on data volume and node count. Aim for balanced distribution.

Use Replication

Always use replication_factor >= 3 in production for high availability.

Monitor Transfers

Watch shard transfer metrics and logs during migrations to catch issues early.

Test Failover

Regularly test node failures and automatic recovery to validate cluster resilience.

Troubleshooting

Node Cannot Join Cluster

Check:

Network connectivity between peers
Peer ID is unique and in valid range
Bootstrap node is reachable
Firewall allows port 6335 (p2p)

Slow Shard Transfers

Optimize:

Use wal_delta method for incremental transfers
Increase network bandwidth
Check outgoing_shard_transfers_limit

Split Brain Scenario

If consensus is lost:

Identify the partition with majority of nodes
Restart minority nodes to rejoin
Check network stability
Review consensus configuration

Uneven Shard Distribution

Rebalance manually:

Identify overloaded nodes
Move/replicate shards to underutilized nodes
Consider resharding for long-term solution

Documentation Index

​Horizontal Scaling

​Cluster Architecture

​Enabling Cluster Mode

​Peer ID Assignment

​Adding Nodes to Cluster

​Bootstrapping a Cluster

​Adding Additional Nodes

​Verifying Cluster State

​Shard Management

​Shard Distribution

​Creating Collections with Sharding

​Viewing Shard Distribution

​Shard Migration

​Initiating Shard Transfer

​Shard Transfer Methods

​Transfer Rate Limits

​Monitoring Shard Transfers

​Aborting Shard Transfers

​Rebalancing

​Manual Rebalancing

​Scaling Strategies

​Scaling Up (Adding Nodes)

​Scaling Down (Removing Nodes)

​Resharding

​Zero-Downtime Updates

​Rolling Updates

​Update Procedure

​Health Checks During Updates

​Consensus Stability

​Load Balancing

​Client-Side Load Balancing

​Proxy Load Balancing

​Performance Considerations

​Shard Sizing

​Replication Trade-offs

​Network Configuration

​Best Practices

Plan Shard Count

Use Replication

Monitor Transfers

Test Failover

​Troubleshooting

​Node Cannot Join Cluster

​Slow Shard Transfers

​Split Brain Scenario

​Uneven Shard Distribution

Horizontal Scaling

Cluster Architecture

Enabling Cluster Mode

Peer ID Assignment

Adding Nodes to Cluster

Bootstrapping a Cluster

Adding Additional Nodes

Verifying Cluster State

Shard Management

Shard Distribution

Creating Collections with Sharding

Viewing Shard Distribution

Shard Migration

Initiating Shard Transfer

Shard Transfer Methods

Transfer Rate Limits

Monitoring Shard Transfers

Aborting Shard Transfers

Rebalancing

Manual Rebalancing

Scaling Strategies

Scaling Up (Adding Nodes)

Scaling Down (Removing Nodes)

Resharding

Zero-Downtime Updates

Rolling Updates

Update Procedure

Health Checks During Updates

Consensus Stability

Load Balancing

Client-Side Load Balancing

Proxy Load Balancing

Performance Considerations

Shard Sizing

Replication Trade-offs

Network Configuration

Best Practices

Troubleshooting

Node Cannot Join Cluster

Slow Shard Transfers

Split Brain Scenario

Uneven Shard Distribution