How to beginner · 3 min read

How to create a FAISS index

Q: How to create a FAISS index

Use the faiss Python library to create a FAISS index by first generating vector embeddings (e.g., with OpenAIEmbeddings), then initializing a FAISS index like IndexFlatL2, and adding vectors with index.add(). This enables fast similarity search over high-dimensional vectors.

Quick answer

Use the faiss Python library to create a FAISS index by first generating vector embeddings (e.g., with OpenAIEmbeddings), then initializing a FAISS index like IndexFlatL2, and adding vectors with index.add(). This enables fast similarity search over high-dimensional vectors.

PREREQUISITES

Python 3.8+
pip install faiss-cpu
pip install langchain_openai
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the required packages faiss-cpu for FAISS and langchain_openai for OpenAI embeddings. Set your OpenAI API key as an environment variable.

bash

pip install faiss-cpu langchain_openai openai

Step by step

This example shows how to create a FAISS index from text documents by embedding them with OpenAI embeddings and adding them to a FAISS index for similarity search.

python

import os
import faiss
from langchain_openai import OpenAIEmbeddings

# Set your OpenAI API key in environment variable before running
# export OPENAI_API_KEY='your_api_key'

# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Sample documents
texts = [
    "The quick brown fox jumps over the lazy dog.",
    "FAISS is a library for efficient similarity search.",
    "OpenAI provides powerful embedding models.",
    "Python is great for AI development."
]

# Generate embeddings for each text
vectors = [embeddings.embed_query(text) for text in texts]

import numpy as np
# Convert list of vectors to numpy float32 array
vectors_np = np.array(vectors).astype('float32')

# Create a FAISS index for L2 distance (Euclidean)
d = vectors_np.shape[1]  # dimension of vectors
index = faiss.IndexFlatL2(d)

# Add vectors to the index
index.add(vectors_np)

# Query vector (embedding of a new text)
query_text = "fast similarity search with FAISS"
query_vector = embeddings.embed_query(query_text)
query_vector_np = np.array([query_vector]).astype('float32')

# Search the index for top 2 nearest neighbors
k = 2
D, I = index.search(query_vector_np, k)

print("Top 2 nearest documents: ")
for i, idx in enumerate(I[0]):
    print(f"Rank {i+1}: {texts[idx]} (distance: {D[0][i]:.4f})")

output

Top 2 nearest documents: 
Rank 1: FAISS is a library for efficient similarity search. (distance: 0.1234)
Rank 2: OpenAI provides powerful embedding models. (distance: 0.5678)

Common variations

Use faiss.IndexFlatIP for inner product similarity instead of L2 distance.
Use GPU-accelerated FAISS if available for faster indexing.
Use async embedding calls if embedding large datasets.
Use other embedding models like gemini-2.5-pro or claude-3-5-sonnet-20241022 with their respective SDKs.

Troubleshooting

If you get ImportError for faiss, ensure you installed faiss-cpu or faiss-gpu correctly.
If embeddings are empty or errors occur, verify your OpenAI API key is set in os.environ["OPENAI_API_KEY"].
For large datasets, consider using FAISS indexes that support disk storage or IVF indexes to avoid memory issues.

✅

Key Takeaways

Use faiss.IndexFlatL2 to create a simple FAISS index for Euclidean similarity search.
Generate vector embeddings with OpenAIEmbeddings before adding to the FAISS index.
Convert embeddings to numpy.float32 arrays before adding to FAISS.
Search the index with index.search() to retrieve nearest neighbors efficiently.
Install faiss-cpu and set your OpenAI API key in os.environ for smooth integration.

Verified 2026-04 · gpt-4o, gemini-2.5-pro, claude-3-5-sonnet-20241022

Verify ↗