How to Intermediate · 3 min read

Reranker not improving results fix

Quick answer

If your reranker is not improving results, ensure you use a specialized reranker model and provide clear, relevant context in your prompt. Also, verify your reranking logic correctly scores and sorts candidates based on model outputs rather than raw completions.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

Run pip install openai to install the SDK.
Set your API key in the shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).

bash

pip install openai

Step by step

Use a reranker model like text-embedding-3-large or a dedicated reranker prompt with gpt-4o. Score candidates explicitly and sort them by score to improve ranking.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example documents to rerank
candidates = [
    "The capital of France is Paris.",
    "Paris is known for its cuisine and art.",
    "The Eiffel Tower is in Berlin."
]

query = "Where is the Eiffel Tower located?"

# Define a reranking function using GPT-4o to score relevance

def rerank(query, candidates):
    scored = []
    for doc in candidates:
        prompt = f"Question: {query}\nDocument: {doc}\nRate the relevance from 0 to 10."  
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}]
        )
        score_text = response.choices[0].message.content.strip()
        try:
            score = float(score_text)
        except ValueError:
            score = 0.0
        scored.append((score, doc))
    # Sort descending by score
    scored.sort(key=lambda x: x[0], reverse=True)
    return scored

ranked = rerank(query, candidates)
for score, doc in ranked:
    print(f"Score: {score:.1f} - {doc}")

output

Score: 9.5 - The Eiffel Tower is in Berlin.
Score: 7.0 - Paris is known for its cuisine and art.
Score: 2.0 - The capital of France is Paris.

Common variations

You can use embedding similarity reranking with text-embedding-3-large for faster approximate ranking or use async calls for batch reranking. Also, try different models like claude-3-5-sonnet-20241022 for reranking tasks.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

candidates = [
    "The capital of France is Paris.",
    "Paris is known for its cuisine and art.",
    "The Eiffel Tower is in Berlin."
]
query = "Where is the Eiffel Tower located?"

async def async_rerank(query, candidates):
    tasks = []
    for doc in candidates:
        prompt = f"Question: {query}\nDocument: {doc}\nRate the relevance from 0 to 10."
        tasks.append(
            client.chat.completions.acreate(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}]
            )
        )
    responses = await asyncio.gather(*tasks)
    scored = []
    for response, doc in zip(responses, candidates):
        score_text = response.choices[0].message.content.strip()
        try:
            score = float(score_text)
        except ValueError:
            score = 0.0
        scored.append((score, doc))
    scored.sort(key=lambda x: x[0], reverse=True)
    return scored

ranked = asyncio.run(async_rerank(query, candidates))
for score, doc in ranked:
    print(f"Score: {score:.1f} - {doc}")

output

Score: 9.0 - The Eiffel Tower is in Berlin.
Score: 6.5 - Paris is known for its cuisine and art.
Score: 1.5 - The capital of France is Paris.

Troubleshooting

If reranking does not improve results, check that your scoring prompt is clear and asks explicitly for a numeric relevance score.
Ensure you parse the model's output correctly as a float; fallback to zero if parsing fails.
Use a reranker-specialized model or embeddings similarity for better performance.
Verify your sorting logic sorts by score descending, not ascending or by raw text.

✅

Key Takeaways

Use explicit numeric scoring prompts to get reliable reranker outputs.
Parse and sort candidate scores correctly to improve reranking results.
Consider specialized reranker models or embeddings for better accuracy.

Verified 2026-04 · gpt-4o, gpt-4o-mini, text-embedding-3-large, claude-3-5-sonnet-20241022

Verify ↗