How to batch process embeddings efficiently
Quick answer
Use the
embeddings.create method with a list of texts as input to batch process embeddings efficiently in a single API call. This reduces overhead and latency compared to individual calls, improving throughput and lowering cost.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable.
pip install openai>=1.0 Step by step
Batch process multiple texts by passing a list of strings to client.embeddings.create. This example uses the text-embedding-3-small model to generate embeddings for three sentences in one request.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
texts = [
"OpenAI provides powerful AI models.",
"Batch processing embeddings saves time and cost.",
"Efficient API usage is critical for production apps."
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [item.embedding for item in response.data]
for i, emb in enumerate(embeddings):
print(f"Embedding {i+1} length: {len(emb)}") output
Embedding 1 length: 384 Embedding 2 length: 384 Embedding 3 length: 384
Common variations
You can batch process embeddings asynchronously using asyncio for very large datasets or use different embedding models by changing the model parameter. Adjust batch size to balance latency and memory.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def batch_embeddings(texts):
response = await client.embeddings.acreate(
model="text-embedding-3-small",
input=texts
)
return [item.embedding for item in response.data]
async def main():
texts = [f"Sentence {i}" for i in range(100)]
embeddings = await batch_embeddings(texts)
print(f"Processed {len(embeddings)} embeddings asynchronously.")
asyncio.run(main()) output
Processed 100 embeddings asynchronously.
Troubleshooting
- If you get
RateLimitError, reduce batch size or add retry logic with exponential backoff. - If memory usage is high, split input into smaller batches.
- Ensure your API key is set correctly in
os.environ["OPENAI_API_KEY"].
Key Takeaways
- Batch multiple texts in one
embeddings.createcall to reduce latency and cost. - Adjust batch size to optimize memory and throughput based on your environment.
- Use async calls for large-scale embedding generation to improve efficiency.
- Implement retry and backoff strategies to handle rate limits gracefully.