How to Intermediate · 3 min read

Reranker not improving results fix

Quick answer
If your reranker is not improving results, ensure you use a specialized reranker model and provide clear, relevant context in your prompt. Also, verify your reranking logic correctly scores and sorts candidates based on model outputs rather than raw completions.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

  • Run pip install openai to install the SDK.
  • Set your API key in the shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).
bash
pip install openai

Step by step

Use a reranker model like text-embedding-3-large or a dedicated reranker prompt with gpt-4o. Score candidates explicitly and sort them by score to improve ranking.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example documents to rerank
candidates = [
    "The capital of France is Paris.",
    "Paris is known for its cuisine and art.",
    "The Eiffel Tower is in Berlin."
]

query = "Where is the Eiffel Tower located?"

# Define a reranking function using GPT-4o to score relevance

def rerank(query, candidates):
    scored = []
    for doc in candidates:
        prompt = f"Question: {query}\nDocument: {doc}\nRate the relevance from 0 to 10."  
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}]
        )
        score_text = response.choices[0].message.content.strip()
        try:
            score = float(score_text)
        except ValueError:
            score = 0.0
        scored.append((score, doc))
    # Sort descending by score
    scored.sort(key=lambda x: x[0], reverse=True)
    return scored

ranked = rerank(query, candidates)
for score, doc in ranked:
    print(f"Score: {score:.1f} - {doc}")
output
Score: 9.5 - The Eiffel Tower is in Berlin.
Score: 7.0 - Paris is known for its cuisine and art.
Score: 2.0 - The capital of France is Paris.

Common variations

You can use embedding similarity reranking with text-embedding-3-large for faster approximate ranking or use async calls for batch reranking. Also, try different models like claude-3-5-sonnet-20241022 for reranking tasks.

python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

candidates = [
    "The capital of France is Paris.",
    "Paris is known for its cuisine and art.",
    "The Eiffel Tower is in Berlin."
]
query = "Where is the Eiffel Tower located?"

async def async_rerank(query, candidates):
    tasks = []
    for doc in candidates:
        prompt = f"Question: {query}\nDocument: {doc}\nRate the relevance from 0 to 10."
        tasks.append(
            client.chat.completions.acreate(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}]
            )
        )
    responses = await asyncio.gather(*tasks)
    scored = []
    for response, doc in zip(responses, candidates):
        score_text = response.choices[0].message.content.strip()
        try:
            score = float(score_text)
        except ValueError:
            score = 0.0
        scored.append((score, doc))
    scored.sort(key=lambda x: x[0], reverse=True)
    return scored

ranked = asyncio.run(async_rerank(query, candidates))
for score, doc in ranked:
    print(f"Score: {score:.1f} - {doc}")
output
Score: 9.0 - The Eiffel Tower is in Berlin.
Score: 6.5 - Paris is known for its cuisine and art.
Score: 1.5 - The capital of France is Paris.

Troubleshooting

  • If reranking does not improve results, check that your scoring prompt is clear and asks explicitly for a numeric relevance score.
  • Ensure you parse the model's output correctly as a float; fallback to zero if parsing fails.
  • Use a reranker-specialized model or embeddings similarity for better performance.
  • Verify your sorting logic sorts by score descending, not ascending or by raw text.

Key Takeaways

  • Use explicit numeric scoring prompts to get reliable reranker outputs.
  • Parse and sort candidate scores correctly to improve reranking results.
  • Consider specialized reranker models or embeddings for better accuracy.
Verified 2026-04 · gpt-4o, gpt-4o-mini, text-embedding-3-large, claude-3-5-sonnet-20241022
Verify ↗