Concept intermediate · 3 min read

What is product quantization in vector search

Quick answer

Product quantization (PQ) is a vector compression technique used in vector search that splits high-dimensional vectors into smaller sub-vectors and quantizes each sub-vector separately. This reduces memory usage and accelerates approximate nearest neighbor search by enabling efficient distance computations on compressed representations.

Product quantization (PQ) is a vector compression method that splits vectors into subspaces and quantizes each independently to enable fast and memory-efficient approximate nearest neighbor search.

How it works

Product quantization works by dividing a high-dimensional vector into multiple lower-dimensional sub-vectors (subspaces). Each sub-vector is then quantized separately using a codebook of representative centroids. Instead of storing the full vector, only the indices of the closest centroids for each sub-vector are stored, drastically reducing storage size.

This approach allows approximate distance computations by summing precomputed distances between sub-vector centroids, enabling fast similarity search. Think of it like compressing a large image by splitting it into tiles and representing each tile with a limited palette of colors.

Concrete example

Suppose you have 128-dimensional vectors and split them into 8 sub-vectors of 16 dimensions each. For each sub-vector, you create a codebook with 256 centroids (codes). Each sub-vector is replaced by the index of its nearest centroid, so the original 128 floats are compressed into 8 bytes (one byte per sub-vector).

python

import numpy as np
from sklearn.cluster import KMeans

# Original vector
vector = np.random.rand(128)

# Parameters
num_subvectors = 8
subvector_dim = 128 // num_subvectors
num_centroids = 256

# Split vector into sub-vectors
subvectors = vector.reshape(num_subvectors, subvector_dim)

# Train codebooks for each sub-vector
codebooks = []
for i in range(num_subvectors):
    # Normally train on many vectors; here we simulate with random centroids
    kmeans = KMeans(n_clusters=num_centroids, random_state=0).fit(np.random.rand(1000, subvector_dim))
    codebooks.append(kmeans.cluster_centers_)

# Quantize each sub-vector by nearest centroid index
quantized_indices = []
for i, subvec in enumerate(subvectors):
    centroids = codebooks[i]
    distances = np.linalg.norm(centroids - subvec, axis=1)
    quantized_indices.append(np.argmin(distances))

print("Quantized indices:", quantized_indices)

output

Quantized indices: [42, 7, 198, 56, 123, 3, 87, 250]

When to use it

Use product quantization when you need to perform approximate nearest neighbor (ANN) search on very large vector datasets where memory and latency are critical constraints. PQ is ideal for high-dimensional embeddings from models like OpenAI or sentence-transformers to enable fast similarity search with reduced storage.

Avoid PQ if exact nearest neighbor search is required or if your dataset is small enough to fit in memory without compression.

Key terms

Term	Definition
Product quantization (PQ)	A vector compression technique splitting vectors into subspaces and quantizing each independently.
Codebook	A set of representative centroids used to quantize sub-vectors.
Centroid	A representative vector in a codebook used for quantization.
Approximate nearest neighbor (ANN)	A fast search method that finds close but not necessarily exact neighbors.
Sub-vector	A lower-dimensional segment of the original high-dimensional vector.

✅

Key Takeaways

Product quantization compresses vectors by splitting and quantizing sub-vectors independently.
PQ enables fast approximate nearest neighbor search with significantly reduced memory usage.
Use PQ for large-scale vector search where speed and storage efficiency are priorities.
PQ trades off exactness for efficiency, so it is not suitable for exact search needs.

Verified 2026-04

Verify ↗