How to speed up embedding generation
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai>=1.0 Step by step
This example demonstrates batching multiple texts in a single embedding request using the text-embedding-3-small model for faster generation.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
texts = [
"OpenAI provides powerful AI models.",
"Batching requests reduces latency.",
"Use efficient models for speed."
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
for i, embedding in enumerate(response.data):
print(f"Text {i+1} embedding vector length: {len(embedding.embedding)}") Text 1 embedding vector length: 384 Text 2 embedding vector length: 384 Text 3 embedding vector length: 384
Common variations
Use asynchronous calls with asyncio to parallelize embedding requests when batching is not possible. Alternatively, select smaller embedding models like text-embedding-3-small over larger ones for faster response times.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def get_embedding(text):
response = await client.embeddings.acreate(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
async def main():
texts = [
"Async calls speed up embedding generation.",
"Parallel requests reduce total time.",
"Choose efficient models wisely."
]
tasks = [get_embedding(text) for text in texts]
embeddings = await asyncio.gather(*tasks)
for i, emb in enumerate(embeddings):
print(f"Async embedding {i+1} length: {len(emb)}")
asyncio.run(main()) Async embedding 1 length: 384 Async embedding 2 length: 384 Async embedding 3 length: 384
Troubleshooting
If you encounter rate limit errors, reduce batch size or add retry logic with exponential backoff. For network timeouts, ensure stable internet and reuse client instances to keep connections alive. If embeddings are slow, verify you are using a smaller, faster model like text-embedding-3-small.
Key Takeaways
- Batch multiple texts in one embedding request to reduce API calls and latency.
- Use asynchronous requests to parallelize embedding generation when batching is not feasible.
- Select smaller, optimized embedding models like text-embedding-3-small for faster performance.
- Reuse client instances to optimize network connections and reduce overhead.
- Implement retry and backoff strategies to handle rate limits and transient errors.