Monitoring - Qdrant

Metrics Endpoint

Qdrant exposes Prometheus-compatible metrics at the /metrics endpoint. These metrics provide detailed insights into system performance, resource utilization, and operational health.

Accessing Metrics

Metrics are available via HTTP:

curl http://localhost:6333/metrics

Dedicated Metrics Port

For production environments, you can configure a separate port for metrics that is not protected by API keys:

config/config.yaml

service:
  metrics_port: 9091

The metrics port should only be accessible to trusted monitoring systems and not exposed to untrusted networks.

Custom Metrics Prefix

You can customize the prefix for all metrics:

config/config.yaml

service:
  metrics_prefix: qdrant_

The prefix must contain only alphanumeric characters and underscores.

Available Metrics

Application Metrics

App Info

Metric: app_info
Type: Gauge
Description: Information about the Qdrant server
Labels: name, version

Recovery Mode

Metric: app_status_recovery_mode
Type: Gauge
Description: Whether recovery mode is enabled (1) or disabled (0)

Collection Metrics

Total Collections

Metric: collections_total
Type: Gauge
Description: Number of collections in the system

Total Vectors

Metric: collections_vector_total
Type: Gauge
Description: Total number of vectors across all collections

Vectors by Collection

Metric: collection_vectors
Type: Gauge
Description: Number of vectors per collection and vector name
Labels: collection, vector

Collection Points

Metric: collection_points
Type: Gauge
Description: Approximate number of points per collection
Labels: id

Running Optimizations

Metric: collection_running_optimizations
Type: Gauge
Description: Number of currently running optimization tasks per collection
Labels: id

Update Queue Length

Metric: collection_update_queue_length
Type: Gauge
Description: Number of pending operations in update queues per collection
Labels: id

Cluster Metrics

Cluster Enabled

Metric: cluster_enabled
Type: Gauge
Description: Whether cluster support is enabled (1) or disabled (0)

Total Peers

Metric: cluster_peers_total
Type: Gauge
Description: Total number of cluster peers

Cluster Term

Metric: cluster_term
Type: Counter
Description: Current cluster consensus term

Cluster Commit

Metric: cluster_commit
Type: Counter
Description: Index of last committed operation
Labels: peer_id

Pending Operations

Metric: cluster_pending_operations_total
Type: Gauge
Description: Total number of pending consensus operations

Active Replicas

Metric: collection_active_replicas_min, collection_active_replicas_max
Type: Gauge
Description: Minimum and maximum number of active replicas across all shards

Dead Replicas

Metric: collection_dead_replicas
Type: Gauge
Description: Total number of shard replicas in non-active state

Shard Transfers

Metric: collection_shard_transfer_incoming, collection_shard_transfer_outgoing
Type: Gauge
Description: Number of incoming/outgoing shard transfers currently running
Labels: id

Request Metrics

REST Responses

Metric: rest_responses_total
Type: Counter
Description: Total number of REST API responses
Labels: method, endpoint, status

REST Response Duration

Metrics: rest_responses_avg_duration_seconds, rest_responses_min_duration_seconds, rest_responses_max_duration_seconds, rest_responses_duration_seconds (histogram)
Type: Gauge/Histogram
Description: Response duration statistics for REST API
Labels: method, endpoint, status

gRPC Responses

Metric: grpc_responses_total
Type: Counter
Description: Total number of gRPC responses
Labels: endpoint, status

gRPC Response Duration

Metrics: grpc_responses_avg_duration_seconds, grpc_responses_min_duration_seconds, grpc_responses_max_duration_seconds, grpc_responses_duration_seconds (histogram)
Type: Gauge/Histogram
Description: Response duration statistics for gRPC API
Labels: endpoint, status

Memory Metrics

Metric: memory_active_bytes
Type: Gauge
Description: Total bytes in active pages allocated by the application
Metric: memory_allocated_bytes
Type: Gauge
Description: Total bytes allocated by the application
Metric: memory_resident_bytes
Type: Gauge
Description: Maximum bytes in physically resident data pages
Metric: memory_retained_bytes
Type: Gauge
Description: Total bytes in virtual memory mappings

System Metrics (Linux)

Process Threads

Metric: process_threads
Type: Gauge
Description: Count of active threads

Open File Descriptors

Metric: process_open_fds
Type: Gauge
Description: Count of currently open file descriptors
Metric: process_max_fds
Type: Gauge
Description: Limit for open file descriptors

Memory Maps

Metric: process_open_mmaps
Type: Gauge
Description: Count of open memory maps
Metric: system_max_mmaps
Type: Gauge
Description: System-wide limit of open memory maps

Page Faults

Metric: process_minor_page_faults_total
Type: Counter
Description: Count of minor page faults (no disk access)
Metric: process_major_page_faults_total
Type: Counter
Description: Count of major page faults (disk access required)

Snapshot Metrics

Metric: snapshot_creation_running
Type: Gauge
Description: Number of snapshot creations currently running
Labels: id
Metric: snapshot_recovery_running
Type: Gauge
Description: Number of snapshot recovery operations currently running
Labels: id
Metric: snapshot_created_total
Type: Counter
Description: Total number of snapshots created
Labels: id

Prometheus Integration

Configuration Example

Add Qdrant to your Prometheus configuration:

prometheus.yml

scrape_configs:
  - job_name: 'qdrant'
    static_configs:
      - targets: ['localhost:6333']
    metrics_path: '/metrics'
    scrape_interval: 15s

For dedicated metrics port:

prometheus.yml

scrape_configs:
  - job_name: 'qdrant'
    static_configs:
      - targets: ['localhost:9091']
    scrape_interval: 15s

Grafana Dashboard

Create dashboards using the exposed metrics to visualize:

Query performance and latency
Memory and CPU usage
Collection growth over time
Cluster health and consensus state
Shard transfer progress
Optimization task activity

Telemetry API

Qdrant provides a telemetry API endpoint that returns detailed system information:

curl http://localhost:6333/telemetry

Telemetry Levels

Control the level of detail returned:

# Basic telemetry (level 0)
curl http://localhost:6333/telemetry?detail_level=0

# Detailed telemetry (level 1+)
curl http://localhost:6333/telemetry?detail_level=1

Telemetry Data Structure

The telemetry endpoint returns:

App information: Version, build info, features enabled
Collections: Count, vector statistics, optimization status
Cluster state: Peer information, consensus status, transfers
Requests: API usage statistics by endpoint
Memory: Allocation statistics
Hardware: CPU and I/O metrics (when hardware reporting is enabled)

Enabling Hardware Reporting

Hardware utilization metrics can be included in API responses:

config/config.yaml

service:
  hardware_reporting: true

Hardware reporting is experimental and adds overhead to requests.

Health Checks

Liveness Check

Verify that Qdrant is running:

curl http://localhost:6333/

Returns 200 OK with version information if the service is alive.

Readiness Check

Verify that Qdrant is ready to handle requests:

curl http://localhost:6333/readyz

The readiness check verifies:

Consensus sync: Node has caught up with cluster commit index
Shard health: All local shards are in a healthy state
Bootstrap completion: Cluster has been bootstrapped (when applicable)

Returns:

200 OK - Node is ready
503 Service Unavailable - Node is not ready

Use the readiness endpoint for load balancer health checks to avoid routing traffic to nodes that are still synchronizing.

Logging

Log Configuration

Configure logging in config/config.yaml:

config/config.yaml

log_level: INFO

logger:
  format: text  # or "json"
  on_disk:
    enabled: true
    log_file: /var/log/qdrant/qdrant.log
    log_level: INFO
    format: text
    buffer_size_bytes: 1024

Log Levels

ERROR - Error messages only
WARN - Warnings and errors
INFO - General information (default)
DEBUG - Detailed debugging information
TRACE - Very verbose tracing

JSON Logging

For structured logging (recommended for production):

config/config.yaml

logger:
  format: json

Slow Query Logging

Log queries that take longer than a threshold:

config/config.yaml

service:
  slow_query_secs: 1.0

Queries exceeding this duration will be logged at WARN level.

Best Practices

Set Up Alerts

Configure Prometheus alerts for critical metrics like dead replicas, high memory usage, and slow queries.

Monitor Disk Space

Watch storage paths, snapshot directories, and WAL locations for available space.

Track Shard Health

Monitor collection_active_replicas_min to detect availability issues early.

Analyze Request Patterns

Use request duration histograms to identify performance bottlenecks.

Troubleshooting

High Memory Usage

Check:

memory_allocated_bytes vs memory_resident_bytes
Collection count and vector density
on_disk_payload setting in storage configuration

Cluster Lag

Monitor:

cluster_pending_operations_total
cluster_commit differences between peers
Network connectivity between nodes

Slow Queries

Investigate:

rest_responses_duration_seconds histogram
Collection optimization status
Index configuration (HNSW parameters)
Resource contention (CPU, disk I/O)

Documentation Index

​Metrics Endpoint

​Accessing Metrics

​Dedicated Metrics Port

​Custom Metrics Prefix

​Available Metrics

​Application Metrics

​App Info

​Recovery Mode

​Collection Metrics

​Total Collections

​Total Vectors

​Vectors by Collection

​Collection Points

​Running Optimizations

​Update Queue Length

​Cluster Metrics

​Cluster Enabled

​Total Peers

​Cluster Term

​Cluster Commit

​Pending Operations

​Active Replicas

​Dead Replicas

​Shard Transfers

​Request Metrics

​REST Responses

​REST Response Duration

​gRPC Responses

​gRPC Response Duration

​Memory Metrics

​System Metrics (Linux)

​Process Threads

​Open File Descriptors

​Memory Maps

​Page Faults

​Snapshot Metrics

​Prometheus Integration

​Configuration Example

​Grafana Dashboard

​Telemetry API

​Telemetry Levels

​Telemetry Data Structure

​Enabling Hardware Reporting

​Health Checks

​Liveness Check

​Readiness Check

​Logging

​Log Configuration

​Log Levels

​JSON Logging

​Slow Query Logging

​Best Practices

Set Up Alerts

Monitor Disk Space

Track Shard Health

Analyze Request Patterns

​Troubleshooting

​High Memory Usage

​Cluster Lag

​Slow Queries

Metrics Endpoint

Accessing Metrics

Dedicated Metrics Port

Custom Metrics Prefix

Available Metrics

Application Metrics

App Info

Recovery Mode

Collection Metrics

Total Collections

Total Vectors

Vectors by Collection

Collection Points

Running Optimizations

Update Queue Length

Cluster Metrics

Cluster Enabled

Total Peers

Cluster Term

Cluster Commit

Pending Operations

Active Replicas

Dead Replicas

Shard Transfers

Request Metrics

REST Responses

REST Response Duration

gRPC Responses

gRPC Response Duration

Memory Metrics

System Metrics (Linux)

Process Threads

Open File Descriptors

Memory Maps

Page Faults

Snapshot Metrics

Prometheus Integration

Configuration Example

Grafana Dashboard

Telemetry API

Telemetry Levels

Telemetry Data Structure

Enabling Hardware Reporting

Health Checks

Liveness Check

Readiness Check

Logging

Log Configuration

Log Levels

JSON Logging

Slow Query Logging

Best Practices

Troubleshooting

High Memory Usage

Cluster Lag

Slow Queries