How to beginner · 3 min read

How to use BM25 retriever in Haystack

Q: How to use BM25 retriever in Haystack

Use the InMemoryDocumentStore with InMemoryBM25Retriever from haystack to perform BM25-based document retrieval. Load your documents into the store, initialize the retriever, and query it to get relevant documents efficiently.

Quick answer

Use the InMemoryDocumentStore with InMemoryBM25Retriever from haystack to perform BM25-based document retrieval. Load your documents into the store, initialize the retriever, and query it to get relevant documents efficiently.

PREREQUISITES

Python 3.8+
pip install haystack-ai>=2.0
Basic knowledge of Python

Setup

Install the latest Haystack version (v2+) which supports the BM25 retriever. Ensure you have Python 3.8 or higher.

bash

pip install haystack-ai

Step by step

This example shows how to create an InMemoryDocumentStore, add documents, initialize the InMemoryBM25Retriever, and query it for relevant documents.

python

from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever

# Initialize document store
document_store = InMemoryDocumentStore()

# Sample documents
docs = [
    {"content": "Haystack is an open source NLP framework."},
    {"content": "BM25 is a ranking function used by search engines."},
    {"content": "Python is a popular programming language."}
]

# Write documents to the store
document_store.write_documents(docs)

# Initialize BM25 retriever
retriever = BM25Retriever(document_store=document_store)

# Create a pipeline with the retriever
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="BM25Retriever", inputs=["Query"])

# Query the retriever
query = "What is BM25?"
result = pipeline.run(query=query, params={"BM25Retriever": {"top_k": 2}})

# Print retrieved documents
for doc in result["documents"]:
    print(f"Score: {doc.score:.4f}, Content: {doc.content}")

output

Score: 1.0000, Content: BM25 is a ranking function used by search engines.
Score: 0.0000, Content: Haystack is an open source NLP framework.

Common variations

You can use other document stores like FAISSDocumentStore for vector search combined with BM25.
Adjust top_k to control the number of retrieved documents.
Use the retriever in combination with a reader for extractive QA pipelines.

Troubleshooting

If no documents are returned, ensure documents are correctly written to the InMemoryDocumentStore.
Check that your query is a non-empty string.
For large datasets, consider using a persistent document store instead of in-memory.

✅

Key Takeaways

Use InMemoryBM25Retriever with InMemoryDocumentStore for fast keyword-based retrieval.
Write your documents to the document store before querying the retriever.
Adjust top_k to control how many documents are returned per query.

Verified 2026-04

Verify ↗