How to choose the best embedding model for RAG
Quick answer
Choose the best embedding model for
RAG by balancing semantic accuracy, vector dimensionality, and latency. Use models like openai-embedding-3-large for high-quality semantic search or lighter models for faster, cost-effective retrieval depending on your dataset size and query complexity.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to access embedding models.
pip install openai>=1.0 Step by step
This example shows how to generate embeddings using the openai-embedding-3-large model for RAG vector search.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
texts = [
"OpenAI develops advanced AI models.",
"Retrieval-Augmented Generation improves LLM responses.",
"Embeddings convert text into vectors for similarity search."
]
response = client.embeddings.create(
model="openai-embedding-3-large",
input=texts
)
embeddings = [e.embedding for e in response.data]
for i, emb in enumerate(embeddings):
print(f"Embedding vector {i} length: {len(emb)}") output
Embedding vector 0 length: 1536 Embedding vector 1 length: 1536 Embedding vector 2 length: 1536
Common variations
You can choose smaller embedding models like openai-embedding-3-small for faster, cheaper embeddings with lower dimensionality (e.g., 384). For large-scale RAG, consider batch processing and async calls to optimize throughput.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def create_embeddings_async(texts):
response = await client.embeddings.acreate(
model="openai-embedding-3-small",
input=texts
)
return [e.embedding for e in response.data]
texts = ["Fast embeddings for RAG.", "Async calls improve throughput."]
embeddings = asyncio.run(create_embeddings_async(texts))
print(f"Received {len(embeddings)} embeddings asynchronously.") output
Received 2 embeddings asynchronously.
Troubleshooting
If embeddings have inconsistent lengths or errors occur, verify your model name and API key. Also, ensure input text is not empty or too long (max tokens vary by model). For latency issues, switch to smaller embedding models or batch requests.
Key Takeaways
- Use high-dimensional embeddings like
openai-embedding-3-largefor best semantic accuracy in RAG. - Smaller models reduce cost and latency but may sacrifice retrieval quality.
- Batch and async embedding calls improve performance on large datasets.
- Validate input text length and model compatibility to avoid errors.
- Match embedding dimensionality with your vector store capabilities for optimal search.