How to Intermediate · 3 min read

How to implement reranking with LlamaIndex

Q: How to implement reranking with LlamaIndex

Use LlamaIndex to first retrieve candidate documents with a vector or keyword retriever, then apply a reranker like GPTVectorStoreIndex or a custom Reranker to reorder results based on relevance. This two-step approach improves retrieval precision by rescoring candidates with an LLM.

Quick answer

Use LlamaIndex to first retrieve candidate documents with a vector or keyword retriever, then apply a reranker like GPTVectorStoreIndex or a custom Reranker to reorder results based on relevance. This two-step approach improves retrieval precision by rescoring candidates with an LLM.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install llama-index openai

Setup

Install llama-index and openai packages, and set your OpenAI API key as an environment variable.

bash

pip install llama-index openai

Step by step

This example shows how to load documents, create a vector index for retrieval, then rerank the top candidates using an LLM reranker with LlamaIndex.

python

import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, LLMPredictor, PromptHelper
from openai import OpenAI

# Set your OpenAI API key in environment variable
# export OPENAI_API_KEY="your_api_key"

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define LLM predictor using OpenAI GPT-4o
llm_predictor = LLMPredictor(llm=client, model_name="gpt-4o", temperature=0)

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create a vector store index for initial retrieval
index = GPTVectorStoreIndex(documents, llm_predictor=llm_predictor)

# Query to retrieve candidates
query = "Explain the benefits of reranking in search"

# Retrieve top 5 candidates
retrieved_nodes = index.query(query, similarity_top_k=5, response_mode="default")

# Reranking step: use LLM to rescore and reorder candidates
# Here we simulate reranking by re-querying with a prompt that scores relevance
rerank_prompt = (
    "Given the following candidate documents, rank them by relevance to the query:\n"
    "Query: {query}\nCandidates:\n{candidates}\n"
    "Return the candidates ordered by relevance."
)

candidates_text = "\n---\n".join([str(node) for node in retrieved_nodes])

rerank_input = rerank_prompt.format(query=query, candidates=candidates_text)

rerank_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": rerank_input}]
)

print("Reranked results:\n", rerank_response.choices[0].message.content)

output

Reranked results:
 1. Document about benefits of reranking in search...
 2. Document explaining search relevance...
 3. Other related document...
 ...

Common variations

Use response_mode="tree_summarize" or compact for different retrieval outputs.
Switch to async calls with asyncio and OpenAI async client.
Use other LLMs supported by LlamaIndex like anthropic or mistral for reranking.
Customize reranker prompts or implement a dedicated reranker class for advanced scoring.

Troubleshooting

If retrieval returns empty results, verify document loading paths and formats.
For API errors, check your OpenAI API key and usage limits.
Ensure llama-index and openai packages are up to date to avoid compatibility issues.
Adjust similarity_top_k to balance retrieval breadth and reranking cost.

✅

Key Takeaways

Use a vector index to retrieve candidate documents before reranking with an LLM for better relevance.
Customize reranker prompts to fit your domain and improve ranking quality.
Keep your API keys secure and environment variables configured for smooth integration.

Verified 2026-04 · gpt-4o

Verify ↗