How to implement reranking with LlamaIndex
Quick answer
Use
LlamaIndex to first retrieve candidate documents with a vector or keyword retriever, then apply a reranker like GPTVectorStoreIndex or a custom Reranker to reorder results based on relevance. This two-step approach improves retrieval precision by rescoring candidates with an LLM.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install llama-index openai
Setup
Install llama-index and openai packages, and set your OpenAI API key as an environment variable.
pip install llama-index openai Step by step
This example shows how to load documents, create a vector index for retrieval, then rerank the top candidates using an LLM reranker with LlamaIndex.
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, LLMPredictor, PromptHelper
from openai import OpenAI
# Set your OpenAI API key in environment variable
# export OPENAI_API_KEY="your_api_key"
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define LLM predictor using OpenAI GPT-4o
llm_predictor = LLMPredictor(llm=client, model_name="gpt-4o", temperature=0)
# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()
# Create a vector store index for initial retrieval
index = GPTVectorStoreIndex(documents, llm_predictor=llm_predictor)
# Query to retrieve candidates
query = "Explain the benefits of reranking in search"
# Retrieve top 5 candidates
retrieved_nodes = index.query(query, similarity_top_k=5, response_mode="default")
# Reranking step: use LLM to rescore and reorder candidates
# Here we simulate reranking by re-querying with a prompt that scores relevance
rerank_prompt = (
"Given the following candidate documents, rank them by relevance to the query:\n"
"Query: {query}\nCandidates:\n{candidates}\n"
"Return the candidates ordered by relevance."
)
candidates_text = "\n---\n".join([str(node) for node in retrieved_nodes])
rerank_input = rerank_prompt.format(query=query, candidates=candidates_text)
rerank_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": rerank_input}]
)
print("Reranked results:\n", rerank_response.choices[0].message.content) output
Reranked results: 1. Document about benefits of reranking in search... 2. Document explaining search relevance... 3. Other related document... ...
Common variations
- Use
response_mode="tree_summarize"orcompactfor different retrieval outputs. - Switch to async calls with
asyncioandOpenAIasync client. - Use other LLMs supported by
LlamaIndexlikeanthropicormistralfor reranking. - Customize reranker prompts or implement a dedicated reranker class for advanced scoring.
Troubleshooting
- If retrieval returns empty results, verify document loading paths and formats.
- For API errors, check your OpenAI API key and usage limits.
- Ensure
llama-indexandopenaipackages are up to date to avoid compatibility issues. - Adjust
similarity_top_kto balance retrieval breadth and reranking cost.
Key Takeaways
- Use a vector index to retrieve candidate documents before reranking with an LLM for better relevance.
- Customize reranker prompts to fit your domain and improve ranking quality.
- Keep your API keys secure and environment variables configured for smooth integration.