How to scale vector search
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 faiss-cpu chromadb pinecone-client
Setup
Install the necessary Python packages for vector search scaling, including faiss-cpu for local ANN indexing, chromadb for open-source vector DB, and pinecone-client for managed vector search services.
pip install openai faiss-cpu chromadb pinecone-client Collecting openai Collecting faiss-cpu Collecting chromadb Collecting pinecone-client Successfully installed openai faiss-cpu chromadb pinecone-client
Step by step
This example demonstrates scaling vector search using FAISS for local indexing and OpenAI embeddings for vectorization. It shows batch embedding generation, index creation, and querying with approximate nearest neighbors.
import os
import numpy as np
from openai import OpenAI
import faiss
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample documents to index
documents = [
"The quick brown fox jumps over the lazy dog.",
"Vector search scales with distributed indexing.",
"FAISS supports efficient similarity search.",
"OpenAI embeddings provide semantic vectors.",
"Scaling vector search requires sharding and replication."
]
# Generate embeddings in batch
response = client.embeddings.create(
model="text-embedding-3-small",
input=documents
)
embeddings = np.array([data.embedding for data in response.data], dtype=np.float32)
# Normalize embeddings for cosine similarity
faiss.normalize_L2(embeddings)
# Create FAISS index (IndexFlatIP for inner product = cosine similarity on normalized vectors)
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings) # Add all vectors
# Query vector
query_text = "How to efficiently scale vector search?"
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=[query_text]
)
query_embedding = np.array([query_response.data[0].embedding], dtype=np.float32)
faiss.normalize_L2(query_embedding)
# Search top 3 nearest neighbors
k = 3
distances, indices = index.search(query_embedding, k)
print("Query:", query_text)
print("Top matches:")
for i, idx in enumerate(indices[0]):
print(f"{i+1}. {documents[idx]} (score: {distances[0][i]:.4f})") Query: How to efficiently scale vector search? Top matches: 1. Scaling vector search requires sharding and replication. (score: 0.9123) 2. Vector search scales with distributed indexing. (score: 0.8765) 3. FAISS supports efficient similarity search. (score: 0.8457)
Common variations
You can scale vector search further by using managed vector databases like Pinecone or Chroma that handle sharding, replication, and persistence automatically. For asynchronous embedding generation, use async OpenAI clients. For very large datasets, use approximate nearest neighbor indexes like HNSW or IVF in FAISS.
import os
from openai import OpenAI
from pinecone import Pinecone
# Initialize OpenAI and Pinecone clients
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create or connect to Pinecone index
index_name = "example-index"
if index_name not in pc.list_indexes():
pc.create_index(index_name, dimension=1536, metric="cosine")
index = pc.Index(index_name)
# Generate embedding for a document
doc = "Scaling vector search with Pinecone managed service."
response = client.embeddings.create(model="text-embedding-3-small", input=[doc])
embedding = response.data[0].embedding
# Upsert vector to Pinecone
index.upsert([("vec1", embedding)])
# Query Pinecone
query = "How to scale vector search?"
query_response = client.embeddings.create(model="text-embedding-3-small", input=[query])
query_embedding = query_response.data[0].embedding
results = index.query(vector=query_embedding, top_k=3, include_metadata=True)
print("Pinecone query results:", results.matches) Pinecone query results: [Match(id='vec1', score=0.9876, metadata=None)]
Troubleshooting
- If you see high latency, ensure your vector index is sharded or use approximate nearest neighbor algorithms like HNSW or IVF.
- If embeddings are inconsistent, verify you use the same embedding model and normalize vectors before indexing.
- For memory errors, switch to disk-backed indexes or managed vector DBs like Pinecone.
Key Takeaways
- Use distributed or managed vector databases like FAISS, Pinecone, or Chroma to scale vector search efficiently.
- Batch embedding generation and vector normalization improve indexing and query accuracy.
- Approximate nearest neighbor algorithms reduce latency on large datasets.
- Managed services handle sharding, replication, and persistence automatically, simplifying scaling.
- Consistent embedding models and vector preprocessing are critical for reliable search results.