Fix slow vector search
Quick answer
Fix slow vector search by using efficient vector stores like
FAISS or Chroma, batching embedding requests with OpenAI embeddings, and reducing dimensionality if possible. Also, ensure your index is properly built and use approximate nearest neighbor search for speed.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 faiss-cpu
Setup
Install the required packages and set your environment variable for the OpenAI API key.
- Install OpenAI SDK and FAISS for vector search:
pip install openai faiss-cpu output
Collecting openai Collecting faiss-cpu Successfully installed openai-1.x.x faiss-cpu-1.x.x
Step by step
This example shows how to embed a batch of texts using OpenAI embeddings, build a FAISS index, and perform a fast vector search.
import os
import numpy as np
from openai import OpenAI
import faiss
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample documents to index
texts = [
"The quick brown fox jumps over the lazy dog.",
"Artificial intelligence and machine learning.",
"OpenAI provides powerful AI APIs.",
"Vector search is efficient with FAISS.",
"Python is a versatile programming language."
]
# Batch embed texts using OpenAI embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
# Extract embeddings as numpy array
embeddings = np.array([data.embedding for data in response.data], dtype=np.float32)
# Normalize embeddings for cosine similarity
faiss.normalize_L2(embeddings)
# Build FAISS index (IndexFlatIP for cosine similarity)
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)
# Query text
query = "fast AI vector search"
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=[query]
)
query_embedding = np.array([query_response.data[0].embedding], dtype=np.float32)
# Normalize query embedding
faiss.normalize_L2(query_embedding)
# Search top 3 nearest neighbors
k = 3
distances, indices = index.search(query_embedding, k)
print("Query:", query)
print("Top matches:")
for i, idx in enumerate(indices[0]):
print(f"{i+1}. {texts[idx]} (score: {distances[0][i]:.4f})") output
Query: fast AI vector search Top matches: 1. Vector search is efficient with FAISS. (score: 0.9123) 2. OpenAI provides powerful AI APIs. (score: 0.8765) 3. Artificial intelligence and machine learning. (score: 0.8542)
Common variations
You can speed up vector search further by:
- Using approximate nearest neighbor indexes like
faiss.IndexIVFFlatfor large datasets. - Batching embedding requests to reduce API calls.
- Using smaller embedding models like
text-embedding-3-smallfor faster embedding generation. - Using GPU-accelerated FAISS if available.
import faiss
# Example: create an IVF index for approximate search
nlist = 100 # number of clusters
quantizer = faiss.IndexFlatIP(embeddings.shape[1])
index_ivf = faiss.IndexIVFFlat(quantizer, embeddings.shape[1], nlist, faiss.METRIC_INNER_PRODUCT)
# Train and add vectors
index_ivf.train(embeddings)
index_ivf.add(embeddings)
# Normalize embeddings
faiss.normalize_L2(embeddings)
# Normalize query embedding
faiss.normalize_L2(query_embedding)
# Search
distances, indices = index_ivf.search(query_embedding, k)
print("Approximate search results:")
for i, idx in enumerate(indices[0]):
print(f"{i+1}. {texts[idx]} (score: {distances[0][i]:.4f})") output
Approximate search results: 1. Vector search is efficient with FAISS. (score: 0.9101) 2. OpenAI provides powerful AI APIs. (score: 0.8750) 3. Artificial intelligence and machine learning. (score: 0.8503)
Troubleshooting
If vector search is still slow, check:
- Your embeddings are normalized before indexing and querying for cosine similarity.
- The index is properly trained if using IVF or HNSW indexes.
- Batch your embedding requests to reduce API latency.
- Use approximate indexes for large datasets instead of brute-force search.
- Ensure your environment has enough RAM and CPU resources.
Key Takeaways
- Use FAISS or similar vector stores to speed up vector search with efficient indexing.
- Batch embedding requests to reduce API call overhead and latency.
- Normalize embeddings for cosine similarity to improve search accuracy and speed.
- Use approximate nearest neighbor indexes like IVF for large datasets to improve performance.
- Ensure your index is trained and your environment has sufficient resources.