How to Intermediate · 4 min read

How to use embeddings for RAG

Q: How to use embeddings for RAG

Use embeddings to convert documents and queries into vector representations, then perform similarity search to retrieve relevant context for RAG. Combine retrieved text with a chat completion model prompt to generate informed answers.

Quick answer

Use embeddings to convert documents and queries into vector representations, then perform similarity search to retrieve relevant context for RAG. Combine retrieved text with a chat completion model prompt to generate informed answers.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install faiss-cpu

Setup

Install the required Python packages and set your OpenAI API key as an environment variable.

bash

pip install openai faiss-cpu

Step by step

This example shows how to embed documents, build a FAISS vector index, query it with an embedded user question, and use the retrieved context in a gpt-4o chat completion for RAG.

python

import os
from openai import OpenAI
import faiss
import numpy as np

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample documents
documents = [
    "The Eiffel Tower is located in Paris.",
    "Python is a popular programming language.",
    "The Great Wall of China is visible from space.",
    "OpenAI develops advanced AI models."
]

# Step 1: Create embeddings for documents
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents
)
embeddings = [data.embedding for data in response.data]

# Convert to numpy array for FAISS
embedding_dim = len(embeddings[0])
index = faiss.IndexFlatL2(embedding_dim)
index.add(np.array(embeddings).astype('float32'))

# Step 2: Embed the query
query = "Where is the Eiffel Tower located?"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[query]
)
query_embedding = np.array(query_response.data[0].embedding).astype('float32')

# Step 3: Search for top 2 similar documents
k = 2
D, I = index.search(np.array([query_embedding]), k)

# Step 4: Retrieve relevant documents
retrieved_docs = [documents[i] for i in I[0]]
context = "\n".join(retrieved_docs)

# Step 5: Use retrieved context in prompt for RAG
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
]

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

print("Answer:", completion.choices[0].message.content)

output

Answer: The Eiffel Tower is located in Paris.

Common variations

Use async calls with OpenAI SDK for concurrency.
Swap faiss with other vector stores like Chroma or FAISS GPU.
Use different embedding models like text-embedding-3-large for higher quality.
Use gpt-4o-mini or claude-3-5-sonnet-20241022 for cost-effective RAG.

Troubleshooting

If embeddings are empty or errors occur, verify your API key and model name.
If FAISS index search returns no results, check that embeddings are correctly converted to float32 numpy arrays.
Ensure your documents are preprocessed (e.g., cleaned, chunked) for better retrieval.
For large document collections, consider approximate nearest neighbor indexes like IndexIVFFlat in FAISS.

✅

Key Takeaways

Convert documents and queries into vector embeddings for semantic similarity search.
Use a vector index like FAISS to efficiently retrieve relevant context for RAG.
Feed retrieved context as prompt input to a chat completion model for informed answers.

Verified 2026-04 · text-embedding-3-small, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗