How to beginner · 3 min read

Qdrant quantization options

Quick answer
Qdrant supports several quantization options such as int8, int4, and float16 to reduce vector storage size and improve search speed. These options trade off precision for efficiency and are configured during collection creation or index configuration.

PREREQUISITES

  • Python 3.8+
  • pip install qdrant-client>=1.0.0
  • Qdrant server running (local or cloud)

Setup

Install the official qdrant-client Python SDK and ensure you have access to a running Qdrant instance (local or cloud).

Use the following command to install the client:

bash
pip install qdrant-client>=1.0.0

Step by step

Create a Qdrant collection with quantization enabled by specifying the quantization_config parameter. Supported quantization types include int8, int4, and float16. Below is an example creating a collection with int8 quantization to reduce vector size and speed up search.

python
from qdrant_client import QdrantClient
from qdrant_client.http.models import VectorParams, QuantizationConfig, QuantizationType

client = QdrantClient(host="localhost", port=6333)

collection_name = "my_collection"

quantization = QuantizationConfig(
    type=QuantizationType.INT8,  # Options: INT8, INT4, FLOAT16
    quantile=0.99  # Optional: controls quantization quality
)

vector_params = VectorParams(
    size=128,
    distance="Cosine",
    quantization=quantization
)

# Create collection with quantization
client.recreate_collection(
    collection_name=collection_name,
    vectors_config=vector_params
)

print(f"Collection '{collection_name}' created with INT8 quantization.")
output
Collection 'my_collection' created with INT8 quantization.

Common variations

You can switch quantization types by changing QuantizationType to INT4 or FLOAT16 depending on your precision and storage trade-offs. The quantile parameter adjusts the clipping threshold for quantization, typically between 0.9 and 0.99.

Example for float16 quantization:

python
quantization = QuantizationConfig(
    type=QuantizationType.FLOAT16
)

vector_params = VectorParams(
    size=128,
    distance="Cosine",
    quantization=quantization
)

client.recreate_collection(
    collection_name="my_float16_collection",
    vectors_config=vector_params
)

print("Collection with FLOAT16 quantization created.")
output
Collection with FLOAT16 quantization created.

Troubleshooting

  • If you get an error about unsupported quantization type, verify your Qdrant server version supports quantization (Qdrant 1.1+).
  • Ensure vector size matches the size parameter in VectorParams.
  • For unexpected search accuracy drops, try adjusting the quantile parameter or switch quantization type.

Key Takeaways

  • Use Qdrant's quantization options to reduce vector storage and speed up similarity search.
  • Supported quantization types are int8, int4, and float16, configured via QuantizationConfig.
  • Adjust the quantile parameter to balance quantization quality and compression.
  • Ensure your Qdrant server version supports quantization features (1.1 or later).
Verified 2026-04
Verify ↗