How to Intermediate · 4 min read

Fix slow vector search

Quick answer

Fix slow vector search by using efficient vector stores like FAISS or Chroma, batching embedding requests with OpenAI embeddings, and reducing dimensionality if possible. Also, ensure your index is properly built and use approximate nearest neighbor search for speed.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 faiss-cpu

Setup

Install the required packages and set your environment variable for the OpenAI API key.

Install OpenAI SDK and FAISS for vector search:

bash

pip install openai faiss-cpu

output

Collecting openai
Collecting faiss-cpu
Successfully installed openai-1.x.x faiss-cpu-1.x.x

Step by step

This example shows how to embed a batch of texts using OpenAI embeddings, build a FAISS index, and perform a fast vector search.

python

import os
import numpy as np
from openai import OpenAI
import faiss

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample documents to index
texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Artificial intelligence and machine learning.",
    "OpenAI provides powerful AI APIs.",
    "Vector search is efficient with FAISS.",
    "Python is a versatile programming language."
]

# Batch embed texts using OpenAI embeddings
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# Extract embeddings as numpy array
embeddings = np.array([data.embedding for data in response.data], dtype=np.float32)

# Normalize embeddings for cosine similarity
faiss.normalize_L2(embeddings)

# Build FAISS index (IndexFlatIP for cosine similarity)
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)

# Query text
query = "fast AI vector search"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[query]
)
query_embedding = np.array([query_response.data[0].embedding], dtype=np.float32)

# Normalize query embedding
faiss.normalize_L2(query_embedding)

# Search top 3 nearest neighbors
k = 3
distances, indices = index.search(query_embedding, k)

print("Query:", query)
print("Top matches:")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. {texts[idx]} (score: {distances[0][i]:.4f})")

output

Query: fast AI vector search
Top matches:
1. Vector search is efficient with FAISS. (score: 0.9123)
2. OpenAI provides powerful AI APIs. (score: 0.8765)
3. Artificial intelligence and machine learning. (score: 0.8542)

Common variations

You can speed up vector search further by:

Using approximate nearest neighbor indexes like faiss.IndexIVFFlat for large datasets.
Batching embedding requests to reduce API calls.
Using smaller embedding models like text-embedding-3-small for faster embedding generation.
Using GPU-accelerated FAISS if available.

python

import faiss

# Example: create an IVF index for approximate search
nlist = 100  # number of clusters
quantizer = faiss.IndexFlatIP(embeddings.shape[1])
index_ivf = faiss.IndexIVFFlat(quantizer, embeddings.shape[1], nlist, faiss.METRIC_INNER_PRODUCT)

# Train and add vectors
index_ivf.train(embeddings)
index_ivf.add(embeddings)

# Normalize embeddings
faiss.normalize_L2(embeddings)

# Normalize query embedding
faiss.normalize_L2(query_embedding)

# Search
distances, indices = index_ivf.search(query_embedding, k)
print("Approximate search results:")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. {texts[idx]} (score: {distances[0][i]:.4f})")

output

Approximate search results:
1. Vector search is efficient with FAISS. (score: 0.9101)
2. OpenAI provides powerful AI APIs. (score: 0.8750)
3. Artificial intelligence and machine learning. (score: 0.8503)

Troubleshooting

If vector search is still slow, check:

Your embeddings are normalized before indexing and querying for cosine similarity.
The index is properly trained if using IVF or HNSW indexes.
Batch your embedding requests to reduce API latency.
Use approximate indexes for large datasets instead of brute-force search.
Ensure your environment has enough RAM and CPU resources.

✅

Key Takeaways

Use FAISS or similar vector stores to speed up vector search with efficient indexing.
Batch embedding requests to reduce API call overhead and latency.
Normalize embeddings for cosine similarity to improve search accuracy and speed.
Use approximate nearest neighbor indexes like IVF for large datasets to improve performance.
Ensure your index is trained and your environment has sufficient resources.

Verified 2026-04 · text-embedding-3-small

Verify ↗