How to use embeddings for RAG
Quick answer
Use
embeddings to convert documents and queries into vector representations, then perform similarity search to retrieve relevant context for RAG. Combine retrieved text with a chat completion model prompt to generate informed answers.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install faiss-cpu
Setup
Install the required Python packages and set your OpenAI API key as an environment variable.
pip install openai faiss-cpu Step by step
This example shows how to embed documents, build a FAISS vector index, query it with an embedded user question, and use the retrieved context in a gpt-4o chat completion for RAG.
import os
from openai import OpenAI
import faiss
import numpy as np
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample documents
documents = [
"The Eiffel Tower is located in Paris.",
"Python is a popular programming language.",
"The Great Wall of China is visible from space.",
"OpenAI develops advanced AI models."
]
# Step 1: Create embeddings for documents
response = client.embeddings.create(
model="text-embedding-3-small",
input=documents
)
embeddings = [data.embedding for data in response.data]
# Convert to numpy array for FAISS
embedding_dim = len(embeddings[0])
index = faiss.IndexFlatL2(embedding_dim)
index.add(np.array(embeddings).astype('float32'))
# Step 2: Embed the query
query = "Where is the Eiffel Tower located?"
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=[query]
)
query_embedding = np.array(query_response.data[0].embedding).astype('float32')
# Step 3: Search for top 2 similar documents
k = 2
D, I = index.search(np.array([query_embedding]), k)
# Step 4: Retrieve relevant documents
retrieved_docs = [documents[i] for i in I[0]]
context = "\n".join(retrieved_docs)
# Step 5: Use retrieved context in prompt for RAG
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
]
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
print("Answer:", completion.choices[0].message.content) output
Answer: The Eiffel Tower is located in Paris.
Common variations
- Use
asynccalls with OpenAI SDK for concurrency. - Swap
faisswith other vector stores likeChromaorFAISS GPU. - Use different embedding models like
text-embedding-3-largefor higher quality. - Use
gpt-4o-miniorclaude-3-5-sonnet-20241022for cost-effective RAG.
Troubleshooting
- If embeddings are empty or errors occur, verify your API key and model name.
- If FAISS index search returns no results, check that embeddings are correctly converted to
float32numpy arrays. - Ensure your documents are preprocessed (e.g., cleaned, chunked) for better retrieval.
- For large document collections, consider approximate nearest neighbor indexes like
IndexIVFFlatin FAISS.
Key Takeaways
- Convert documents and queries into vector embeddings for semantic similarity search.
- Use a vector index like FAISS to efficiently retrieve relevant context for RAG.
- Feed retrieved context as prompt input to a chat completion model for informed answers.