How to Beginner to Intermediate · 3 min read

How to batch process embeddings efficiently

Q: How to batch process embeddings efficiently

Use the embeddings.create method with a list of texts as input to batch process embeddings efficiently in a single API call. This reduces overhead and latency compared to individual calls, improving throughput and lowering cost.

Quick answer

Use the embeddings.create method with a list of texts as input to batch process embeddings efficiently in a single API call. This reduces overhead and latency compared to individual calls, improving throughput and lowering cost.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

Batch process multiple texts by passing a list of strings to client.embeddings.create. This example uses the text-embedding-3-small model to generate embeddings for three sentences in one request.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

texts = [
    "OpenAI provides powerful AI models.",
    "Batch processing embeddings saves time and cost.",
    "Efficient API usage is critical for production apps."
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

embeddings = [item.embedding for item in response.data]

for i, emb in enumerate(embeddings):
    print(f"Embedding {i+1} length: {len(emb)}")

output

Embedding 1 length: 384
Embedding 2 length: 384
Embedding 3 length: 384

Common variations

You can batch process embeddings asynchronously using asyncio for very large datasets or use different embedding models by changing the model parameter. Adjust batch size to balance latency and memory.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def batch_embeddings(texts):
    response = await client.embeddings.acreate(
        model="text-embedding-3-small",
        input=texts
    )
    return [item.embedding for item in response.data]

async def main():
    texts = [f"Sentence {i}" for i in range(100)]
    embeddings = await batch_embeddings(texts)
    print(f"Processed {len(embeddings)} embeddings asynchronously.")

asyncio.run(main())

output

Processed 100 embeddings asynchronously.

Troubleshooting

If you get RateLimitError, reduce batch size or add retry logic with exponential backoff.
If memory usage is high, split input into smaller batches.
Ensure your API key is set correctly in os.environ["OPENAI_API_KEY"].

✅

Key Takeaways

Batch multiple texts in one embeddings.create call to reduce latency and cost.
Adjust batch size to optimize memory and throughput based on your environment.
Use async calls for large-scale embedding generation to improve efficiency.
Implement retry and backoff strategies to handle rate limits gracefully.

Verified 2026-04 · text-embedding-3-small

Verify ↗