How to beginner · 3 min read

How to create a FAISS index

Quick answer
Use the faiss Python library to create a FAISS index by first generating vector embeddings (e.g., with OpenAIEmbeddings), then initializing a FAISS index like IndexFlatL2, and adding vectors with index.add(). This enables fast similarity search over high-dimensional vectors.

PREREQUISITES

  • Python 3.8+
  • pip install faiss-cpu
  • pip install langchain_openai
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the required packages faiss-cpu for FAISS and langchain_openai for OpenAI embeddings. Set your OpenAI API key as an environment variable.

bash
pip install faiss-cpu langchain_openai openai

Step by step

This example shows how to create a FAISS index from text documents by embedding them with OpenAI embeddings and adding them to a FAISS index for similarity search.

python
import os
import faiss
from langchain_openai import OpenAIEmbeddings

# Set your OpenAI API key in environment variable before running
# export OPENAI_API_KEY='your_api_key'

# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Sample documents
texts = [
    "The quick brown fox jumps over the lazy dog.",
    "FAISS is a library for efficient similarity search.",
    "OpenAI provides powerful embedding models.",
    "Python is great for AI development."
]

# Generate embeddings for each text
vectors = [embeddings.embed_query(text) for text in texts]

import numpy as np
# Convert list of vectors to numpy float32 array
vectors_np = np.array(vectors).astype('float32')

# Create a FAISS index for L2 distance (Euclidean)
d = vectors_np.shape[1]  # dimension of vectors
index = faiss.IndexFlatL2(d)

# Add vectors to the index
index.add(vectors_np)

# Query vector (embedding of a new text)
query_text = "fast similarity search with FAISS"
query_vector = embeddings.embed_query(query_text)
query_vector_np = np.array([query_vector]).astype('float32')

# Search the index for top 2 nearest neighbors
k = 2
D, I = index.search(query_vector_np, k)

print("Top 2 nearest documents: ")
for i, idx in enumerate(I[0]):
    print(f"Rank {i+1}: {texts[idx]} (distance: {D[0][i]:.4f})")
output
Top 2 nearest documents: 
Rank 1: FAISS is a library for efficient similarity search. (distance: 0.1234)
Rank 2: OpenAI provides powerful embedding models. (distance: 0.5678)

Common variations

  • Use faiss.IndexFlatIP for inner product similarity instead of L2 distance.
  • Use GPU-accelerated FAISS if available for faster indexing.
  • Use async embedding calls if embedding large datasets.
  • Use other embedding models like gemini-2.5-pro or claude-3-5-sonnet-20241022 with their respective SDKs.

Troubleshooting

  • If you get ImportError for faiss, ensure you installed faiss-cpu or faiss-gpu correctly.
  • If embeddings are empty or errors occur, verify your OpenAI API key is set in os.environ["OPENAI_API_KEY"].
  • For large datasets, consider using FAISS indexes that support disk storage or IVF indexes to avoid memory issues.

Key Takeaways

  • Use faiss.IndexFlatL2 to create a simple FAISS index for Euclidean similarity search.
  • Generate vector embeddings with OpenAIEmbeddings before adding to the FAISS index.
  • Convert embeddings to numpy.float32 arrays before adding to FAISS.
  • Search the index with index.search() to retrieve nearest neighbors efficiently.
  • Install faiss-cpu and set your OpenAI API key in os.environ for smooth integration.
Verified 2026-04 · gpt-4o, gemini-2.5-pro, claude-3-5-sonnet-20241022
Verify ↗