How to beginner · 3 min read

How to use BM25 retriever in LlamaIndex

Q: How to use BM25 retriever in LlamaIndex

Use the BM25Retriever class from llama_index to perform keyword-based document retrieval. Initialize it with your GPTVectorStoreIndex or SimpleDirectoryReader loaded documents, then call retrieve() with your query to get ranked results.

Quick answer

Use the BM25Retriever class from llama_index to perform keyword-based document retrieval. Initialize it with your GPTVectorStoreIndex or SimpleDirectoryReader loaded documents, then call retrieve() with your query to get ranked results.

PREREQUISITES

Python 3.8+
pip install llama-index>=0.6.0
pip install openai>=1.0
OpenAI API key set in environment variable OPENAI_API_KEY

Setup

Install the llama-index package and set your OpenAI API key in the environment. This example uses the BM25 retriever included in LlamaIndex for keyword-based search.

bash

pip install llama-index openai

Step by step

This example loads documents from a directory, builds an index, and uses the BM25Retriever to retrieve relevant documents for a query.

python

import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from llama_index.retrievers import BM25Retriever

# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()

# Build a vector store index (required by BM25Retriever)
index = GPTVectorStoreIndex.from_documents(documents)

# Initialize BM25 retriever with the index
bm25_retriever = BM25Retriever(index=index)

# Query to retrieve documents
query = "What is the impact of climate change?"

# Retrieve top documents
results = bm25_retriever.retrieve(query)

# Print retrieved documents' text
for i, doc in enumerate(results):
    print(f"Document {i+1}:\n{doc.get_text()}\n")

output

Document 1:
Climate change impacts include rising sea levels, extreme weather, and biodiversity loss.

Document 2:
The effects of climate change on agriculture are significant and require adaptation.

Common variations

Use BM25Retriever with different index types like GPTSimpleVectorIndex.
Adjust the number of retrieved documents by passing top_k parameter to retrieve().
Combine BM25 with other retrievers for hybrid search strategies.

python

results = bm25_retriever.retrieve(query, top_k=5)

Troubleshooting

If retrieval returns no results, ensure your documents are properly loaded and indexed.
Check that the data directory contains readable text files.
Verify your environment variable OPENAI_API_KEY is set correctly.

✅

Key Takeaways

Use BM25Retriever from llama_index.retrievers for keyword-based document retrieval.
Initialize BM25Retriever with a vector store index built from your documents.
Adjust retrieval parameters like top_k to control the number of results returned.
Ensure documents are loaded correctly with SimpleDirectoryReader or similar loaders.
Set your OpenAI API key in os.environ["OPENAI_API_KEY"] before running the code.

Verified 2026-04

Verify ↗