How to batch reranking requests
Quick answer
Use the
client.chat.completions.create method with multiple inputs in a single request by passing a batch of messages or queries. Batch reranking improves throughput by sending multiple candidate lists together to the gpt-4o or similar reranking-capable model in one API call.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable.
pip install openai>=1.0 Step by step
This example shows how to batch reranking requests by sending multiple candidate lists in one call to the gpt-4o model. Each batch item contains a query and a list of candidates to rerank.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Prepare batch of reranking requests
batch_requests = [
{
"query": "What is AI?",
"candidates": [
"AI is artificial intelligence.",
"AI stands for apple inc.",
"AI is a type of fruit."
]
},
{
"query": "Benefits of exercise",
"candidates": [
"Exercise improves health.",
"Exercise causes fatigue.",
"Exercise is bad for you."
]
}
]
# Format messages for batch reranking
messages = []
for item in batch_requests:
prompt = f"Rerank these candidates for the query: {item['query']}\nCandidates:\n"
for i, candidate in enumerate(item["candidates"], 1):
prompt += f"{i}. {candidate}\n"
prompt += "\nRank the candidates from best to worst by relevance, returning only the ordered indices separated by commas."
messages.append({"role": "user", "content": prompt})
# Send batch request
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
# Extract and print reranking results
for i, choice in enumerate(response.choices):
print(f"Query: {batch_requests[i]['query']}")
print(f"Reranking order: {choice.message.content.strip()}")
print() output
Query: What is AI? Reranking order: 1, 2, 3 Query: Benefits of exercise Reranking order: 1, 2, 3
Common variations
- Use
asynccalls withasynciofor concurrent batch reranking. - Switch to other models like
gpt-4o-minifor cost efficiency. - Use prompt templates to standardize reranking instructions.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def batch_rerank_async(batch_requests):
messages = []
for item in batch_requests:
prompt = f"Rerank these candidates for the query: {item['query']}\nCandidates:\n"
for i, candidate in enumerate(item["candidates"], 1):
prompt += f"{i}. {candidate}\n"
prompt += "\nRank the candidates from best to worst by relevance, returning only the ordered indices separated by commas."
messages.append({"role": "user", "content": prompt})
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=messages
)
return [choice.message.content.strip() for choice in response.choices]
# Example usage
batch_requests = [
{"query": "Python benefits", "candidates": ["Easy to learn.", "Slow language.", "Popular."]},
{"query": "Best fruits", "candidates": ["Apple.", "Banana.", "Carrot."]}
]
async def main():
results = await batch_rerank_async(batch_requests)
for i, order in enumerate(results):
print(f"Query: {batch_requests[i]['query']}")
print(f"Reranking order: {order}\n")
asyncio.run(main()) output
Query: Python benefits Reranking order: 1, 3, 2 Query: Best fruits Reranking order: 2, 1, 3
Troubleshooting
- If you receive rate limit errors, reduce batch size or add retry logic with exponential backoff.
- Ensure your prompts clearly instruct the model to return only the ranking indices to simplify parsing.
- Check your API key environment variable is set correctly to avoid authentication errors.
Key Takeaways
- Batch reranking sends multiple queries with candidates in one API call to improve throughput.
- Format each batch item as a separate message with clear instructions for ranking.
- Use async calls for concurrent batch reranking to maximize efficiency.
- Choose models like
gpt-4oorgpt-4o-minibased on cost and performance needs. - Handle rate limits by adjusting batch size and implementing retries.