How to Intermediate · 3 min read

How to implement reranking with LlamaIndex

Quick answer
Use LlamaIndex to first retrieve candidate documents with a vector or keyword retriever, then apply a reranker like GPTVectorStoreIndex or a custom Reranker to reorder results based on relevance. This two-step approach improves retrieval precision by rescoring candidates with an LLM.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install llama-index openai

Setup

Install llama-index and openai packages, and set your OpenAI API key as an environment variable.

bash
pip install llama-index openai

Step by step

This example shows how to load documents, create a vector index for retrieval, then rerank the top candidates using an LLM reranker with LlamaIndex.

python
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, LLMPredictor, PromptHelper
from openai import OpenAI

# Set your OpenAI API key in environment variable
# export OPENAI_API_KEY="your_api_key"

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define LLM predictor using OpenAI GPT-4o
llm_predictor = LLMPredictor(llm=client, model_name="gpt-4o", temperature=0)

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create a vector store index for initial retrieval
index = GPTVectorStoreIndex(documents, llm_predictor=llm_predictor)

# Query to retrieve candidates
query = "Explain the benefits of reranking in search"

# Retrieve top 5 candidates
retrieved_nodes = index.query(query, similarity_top_k=5, response_mode="default")

# Reranking step: use LLM to rescore and reorder candidates
# Here we simulate reranking by re-querying with a prompt that scores relevance
rerank_prompt = (
    "Given the following candidate documents, rank them by relevance to the query:\n"
    "Query: {query}\nCandidates:\n{candidates}\n"
    "Return the candidates ordered by relevance."
)

candidates_text = "\n---\n".join([str(node) for node in retrieved_nodes])

rerank_input = rerank_prompt.format(query=query, candidates=candidates_text)

rerank_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": rerank_input}]
)

print("Reranked results:\n", rerank_response.choices[0].message.content)
output
Reranked results:
 1. Document about benefits of reranking in search...
 2. Document explaining search relevance...
 3. Other related document...
 ...

Common variations

  • Use response_mode="tree_summarize" or compact for different retrieval outputs.
  • Switch to async calls with asyncio and OpenAI async client.
  • Use other LLMs supported by LlamaIndex like anthropic or mistral for reranking.
  • Customize reranker prompts or implement a dedicated reranker class for advanced scoring.

Troubleshooting

  • If retrieval returns empty results, verify document loading paths and formats.
  • For API errors, check your OpenAI API key and usage limits.
  • Ensure llama-index and openai packages are up to date to avoid compatibility issues.
  • Adjust similarity_top_k to balance retrieval breadth and reranking cost.

Key Takeaways

  • Use a vector index to retrieve candidate documents before reranking with an LLM for better relevance.
  • Customize reranker prompts to fit your domain and improve ranking quality.
  • Keep your API keys secure and environment variables configured for smooth integration.
Verified 2026-04 · gpt-4o
Verify ↗