How to beginner · 4 min read

How to find nearest neighbors with embeddings

Q: How to find nearest neighbors with embeddings

To find nearest neighbors with embeddings, first generate vector embeddings for your data using a model like text-embedding-3-small. Then use a vector similarity search library such as FAISS to index these embeddings and query for the closest vectors by cosine similarity or Euclidean distance.

Quick answer

To find nearest neighbors with embeddings, first generate vector embeddings for your data using a model like text-embedding-3-small. Then use a vector similarity search library such as FAISS to index these embeddings and query for the closest vectors by cosine similarity or Euclidean distance.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 faiss-cpu numpy

Setup

Install the required Python packages: openai for generating embeddings, faiss-cpu for fast nearest neighbor search, and numpy for numerical operations.

bash

pip install openai faiss-cpu numpy

Step by step

This example shows how to generate embeddings for a list of texts using OpenAI's text-embedding-3-small model, index them with FAISS, and query the nearest neighbors for a new input.

python

import os
import numpy as np
from openai import OpenAI
import faiss

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample data to embed
texts = [
    "The quick brown fox jumps over the lazy dog",
    "A fast brown fox leaps over lazy dogs",
    "Artificial intelligence and machine learning",
    "OpenAI develops powerful AI models",
    "Python programming language"
]

# Generate embeddings for the texts
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)
embeddings = np.array([data.embedding for data in response.data], dtype=np.float32)

# Normalize embeddings for cosine similarity
faiss.normalize_L2(embeddings)

# Build FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)  # Inner product = cosine similarity on normalized vectors
index.add(embeddings)

# Query text
query_text = "Fast AI models for programming"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[query_text]
)
query_embedding = np.array(query_response.data[0].embedding, dtype=np.float32).reshape(1, -1)
faiss.normalize_L2(query_embedding)

# Search top 3 nearest neighbors
k = 3
distances, indices = index.search(query_embedding, k)

print(f"Query: {query_text}")
print("Nearest neighbors:")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. {texts[idx]} (score: {distances[0][i]:.4f})")

output

Query: Fast AI models for programming
Nearest neighbors:
1. OpenAI develops powerful AI models (score: 0.89xx)
2. Artificial intelligence and machine learning (score: 0.85xx)
3. Python programming language (score: 0.75xx)

Common variations

Use faiss.IndexFlatL2 for Euclidean distance instead of cosine similarity.
Use asynchronous calls with OpenAI SDK for batch embedding generation.
Try other embedding models like text-embedding-3-large for higher quality vectors.
Use approximate nearest neighbor indexes like faiss.IndexIVFFlat for large datasets.

Troubleshooting

If you get API key missing errors, ensure OPENAI_API_KEY is set in your environment.
If faiss installation fails, try pip install faiss-cpu or use a compatible wheel for your OS.
Low similarity scores may indicate unnormalized vectors; always normalize embeddings before cosine similarity search.

✅

Key Takeaways

Generate vector embeddings using OpenAI's embedding models like text-embedding-3-small.
Use FAISS to efficiently index and search embeddings for nearest neighbors.
Normalize embeddings before cosine similarity search to get accurate nearest neighbors.
For large datasets, use approximate nearest neighbor indexes to improve search speed.
Always keep your API key secure and set it via environment variables.

Verified 2026-04 · text-embedding-3-small

Verify ↗