How to Intermediate · 3 min read

How to batch reranking requests

Q: How to batch reranking requests

Use the client.chat.completions.create method with multiple inputs in a single request by passing a batch of messages or queries. Batch reranking improves throughput by sending multiple candidate lists together to the gpt-4o or similar reranking-capable model in one API call.

Quick answer

Use the client.chat.completions.create method with multiple inputs in a single request by passing a batch of messages or queries. Batch reranking improves throughput by sending multiple candidate lists together to the gpt-4o or similar reranking-capable model in one API call.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

This example shows how to batch reranking requests by sending multiple candidate lists in one call to the gpt-4o model. Each batch item contains a query and a list of candidates to rerank.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Prepare batch of reranking requests
batch_requests = [
    {
        "query": "What is AI?",
        "candidates": [
            "AI is artificial intelligence.",
            "AI stands for apple inc.",
            "AI is a type of fruit."
        ]
    },
    {
        "query": "Benefits of exercise",
        "candidates": [
            "Exercise improves health.",
            "Exercise causes fatigue.",
            "Exercise is bad for you."
        ]
    }
]

# Format messages for batch reranking
messages = []
for item in batch_requests:
    prompt = f"Rerank these candidates for the query: {item['query']}\nCandidates:\n"
    for i, candidate in enumerate(item["candidates"], 1):
        prompt += f"{i}. {candidate}\n"
    prompt += "\nRank the candidates from best to worst by relevance, returning only the ordered indices separated by commas."
    messages.append({"role": "user", "content": prompt})

# Send batch request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

# Extract and print reranking results
for i, choice in enumerate(response.choices):
    print(f"Query: {batch_requests[i]['query']}")
    print(f"Reranking order: {choice.message.content.strip()}")
    print()

output

Query: What is AI?
Reranking order: 1, 2, 3

Query: Benefits of exercise
Reranking order: 1, 2, 3

Common variations

Use async calls with asyncio for concurrent batch reranking.
Switch to other models like gpt-4o-mini for cost efficiency.
Use prompt templates to standardize reranking instructions.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def batch_rerank_async(batch_requests):
    messages = []
    for item in batch_requests:
        prompt = f"Rerank these candidates for the query: {item['query']}\nCandidates:\n"
        for i, candidate in enumerate(item["candidates"], 1):
            prompt += f"{i}. {candidate}\n"
        prompt += "\nRank the candidates from best to worst by relevance, returning only the ordered indices separated by commas."
        messages.append({"role": "user", "content": prompt})

    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages
    )
    return [choice.message.content.strip() for choice in response.choices]

# Example usage
batch_requests = [
    {"query": "Python benefits", "candidates": ["Easy to learn.", "Slow language.", "Popular."]},
    {"query": "Best fruits", "candidates": ["Apple.", "Banana.", "Carrot."]}
]

async def main():
    results = await batch_rerank_async(batch_requests)
    for i, order in enumerate(results):
        print(f"Query: {batch_requests[i]['query']}")
        print(f"Reranking order: {order}\n")

asyncio.run(main())

output

Query: Python benefits
Reranking order: 1, 3, 2

Query: Best fruits
Reranking order: 2, 1, 3

Troubleshooting

If you receive rate limit errors, reduce batch size or add retry logic with exponential backoff.
Ensure your prompts clearly instruct the model to return only the ranking indices to simplify parsing.
Check your API key environment variable is set correctly to avoid authentication errors.

Key Takeaways

Batch reranking sends multiple queries with candidates in one API call to improve throughput.
Format each batch item as a separate message with clear instructions for ranking.
Use async calls for concurrent batch reranking to maximize efficiency.
Choose models like gpt-4o or gpt-4o-mini based on cost and performance needs.
Handle rate limits by adjusting batch size and implementing retries.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.