High severity beginner · Fix: 2-5 min

ModuleNotFoundError

ModuleNotFoundError: No module named 'rank_bm25'

What this error means

LangChain's BM25Retriever requires the rank_bm25 library, which is not installed by default even when you install langchain-community.

Stack trace

traceback

Traceback (most recent call last):
  File "app.py", line 5, in <module>
    from langchain_community.retrievers import BM25Retriever
  File "/usr/local/lib/python3.11/site-packages/langchain_community/retrievers/bm25.py", line 1, in <module>
    from rank_bm25 import BM25Okapi
ModuleNotFoundError: No module named 'rank_bm25'

QUICK FIX

Run `pip install rank_bm25` immediately, then restart your Python interpreter or kernel.

Why it happens

LangChain's BM25Retriever is a thin wrapper around the rank_bm25 library for sparse text retrieval. The rank_bm25 package is not included as a default dependency of langchain-community to keep installation lean. When you import BM25Retriever without rank_bm25 installed, Python cannot find the underlying module and raises ModuleNotFoundError. This is common in hybrid search setups where you add BM25 to complement dense vector retrieval.

Detection

Try importing BM25Retriever in your Python shell: `from langchain_community.retrievers import BM25Retriever`. If it fails with ModuleNotFoundError for rank_bm25, the dependency is missing. Check pip list to confirm rank_bm25 is not present.

Causes & fixes

rank_bm25 package not installed in your environment

✓ Fix

Run `pip install rank_bm25` to install the sparse retrieval backend required by BM25Retriever.

Using an incomplete or minimal langchain installation (e.g., installed only langchain without langchain-community)

✓ Fix

Ensure you have both packages: `pip install langchain langchain-community rank_bm25`. BM25Retriever lives in langchain_community, not core langchain.

Virtual environment mismatch (installed in system Python but running in venv, or vice versa)

✓ Fix

Activate the correct virtual environment first: `source venv/bin/activate` (Linux/Mac) or `venv\Scripts\activate` (Windows), then run `pip install rank_bm25`.

Using an old requirements.txt or environment.yml that doesn't list rank_bm25 as a dependency

✓ Fix

Add rank_bm25 to your requirements.txt file and reinstall: `echo 'rank_bm25>=0.6.0' >> requirements.txt && pip install -r requirements.txt`.

Code: broken vs fixed

Broken - triggers the error

python

import os
from langchain_community.retrievers import BM25Retriever  # This line fails with ModuleNotFoundError
from langchain_core.documents import Document

# Trying to build a hybrid search system
docs = [
    Document(page_content="Dense vectors work well for semantic search"),
    Document(page_content="Sparse BM25 catches exact keyword matches")
]

retriever = BM25Retriever.from_documents(docs)
results = retriever.invoke("keyword exact match")
print(results)

Fixed - works correctly

python

import os
from langchain_community.retrievers import BM25Retriever  # Now works — rank_bm25 is installed
from langchain_core.documents import Document

# Hybrid sparse-dense search setup
docs = [
    Document(page_content="Dense vectors work well for semantic search"),
    Document(page_content="Sparse BM25 catches exact keyword matches")
]

# BM25Retriever requires rank_bm25 to be installed via: pip install rank_bm25
retriever = BM25Retriever.from_documents(docs)
results = retriever.invoke("keyword exact match")
print(f"Retrieved {len(results)} documents using BM25 sparse retrieval")
for doc in results:
    print(f"  - {doc.page_content}")

The only change needed is installing rank_bm25 (via pip install rank_bm25 in your terminal before running this code). The import then succeeds because BM25Retriever can now find its required dependency.

⚠

Workaround

If you cannot install rank_bm25 (e.g., due to environment constraints), implement a simple keyword-matching retriever using Python's built-in string methods: split the query into tokens, score documents by token overlap, and return top-k documents sorted by relevance score. This is slower and less sophisticated than BM25 but requires no external dependencies: `hits = sorted([{'doc': d, 'score': sum(1 for q in query.lower().split() if q in d.page_content.lower())} for d in docs], key=lambda x: x['score'], reverse=True)`.

✓

Prevention

Include all optional dependencies for vector search in your requirements.txt or pyproject.toml at project setup time. For LangChain hybrid search, pin: langchain>=0.2.0, langchain-community>=0.1.0, rank_bm25>=0.6.0. Use an automated dependency checker (pip-audit, dependabot) to catch missing packages in CI/CD before deployment. Document your retrieval architecture so future developers know which packages are required for which search types.

Python 3.9+ · langchain-community >=0.0.1 · tested on 0.2.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.