Code Intermediate medium · 6 min

MultiQueryRetriever: generating query variations to improve recall

What you will learn

MultiQueryRetriever automatically generates multiple query reformulations to find more relevant documents when a single query might miss results.

Why this matters

Single-query retrieval often misses relevant documents due to lexical mismatch or phrasing differences. Users and LLMs phrase things differently than your indexed documents. MultiQueryRetriever solves this by having an LLM generate 3-5 variations of the original query and merging the results, significantly improving recall without sacrificing precision.

Skip if: Don't use MultiQueryRetriever if: (1) your knowledge base is very small (<1,000 documents) where lexical variance matters less, (2) you need sub-100ms retrieval latency since it multiplies LLM calls, (3) your embedding model already handles semantic equivalence well, or (4) you're working with highly domain-specific jargon where reformulation adds noise rather than signal.

Explanation

What it is: MultiQueryRetriever wraps a base retriever and an LLM. When you query it, the LLM generates 3-5 reformulated versions of your question, then each variation is passed to the underlying retriever, and the results are deduplicated and merged.

How it works mechanically: The retriever creates a prompt that instructs an LLM to generate variations like 'Rephrase this question in X different ways.' For each variation, it calls your vector store or BM25 retriever independently, collects all results, removes duplicates (by document ID), and returns the union. The LLM never sees the documents themselves: it only reformulates the question.

When to use it: Use MultiQueryRetriever when your application serves users with natural language queries that don't align with your indexed document phrasing, or when precision isn't critical and you want to maximize recall. It's especially valuable for open-domain QA systems, customer support bots, and search applications where missing a relevant document is worse than returning extra candidates.

Analogy

It's like asking the same question to five different people in different ways to make sure you don't miss what you're looking for. One person might understand 'vehicle' better, another understands 'car,' another 'automobile': so you ask in all three phrasings and collect all their answers.

Code

Illustrative only - not runnable without a valid API key

python

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.documents import Document
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser

docs = [
    Document(page_content="Python is a high-level programming language known for its simplicity and readability.", metadata={"source": "doc1"}),
    Document(page_content="Machine learning models require large amounts of training data to achieve high accuracy.", metadata={"source": "doc2"}),
    Document(page_content="Vector databases store embeddings for fast semantic search capabilities.", metadata={"source": "doc3"}),
    Document(page_content="Deep learning uses neural networks with multiple layers to process data.", metadata={"source": "doc4"}),
    Document(page_content="Natural language processing enables computers to understand human text.", metadata={"source": "doc5"}),
]

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(docs, embeddings)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retriever = MultiQueryRetriever.from_llm(
    retriever=base_retriever,
    llm=llm
)

query = "What programming languages are good for AI?"
results = retriever.invoke(query)

print(f"Query: {query}")
print(f"\nNumber of unique documents retrieved: {len(results)}")
print("\nRetrieved documents:")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content[:70]}... (source: {doc.metadata['source']})")

Output

Query: What programming languages are good for AI?

Number of unique documents retrieved: 4

Retrieved documents:
1. Python is a high-level programming language known for its simp... (source: doc1)
2. Deep learning uses neural networks with multiple layers to pro... (source: doc4)
3. Machine learning models require large amounts of training data... (source: doc2)
4. Natural language processing enables computers to understand hu... (source: doc5)

What just happened?

The code created a FAISS vector store from 5 documents, wrapped it with MultiQueryRetriever using gpt-4o-mini as the reformulation LLM, then queried it with 'What programming languages are good for AI?'. Internally, the LLM generated variations like 'Which programming languages are used for machine learning?', 'Best languages for deep learning', etc. Each variation was passed to the base retriever (k=2), but MultiQueryRetriever merged all results and deduped by document ID, returning 4 unique documents instead of 2.

Common gotcha

The biggest mistake is assuming MultiQueryRetriever guarantees better results: it doesn't. If your LLM generates poor reformulations (too similar to the original, or semantic drift), you may retrieve irrelevant documents and actually hurt precision. Also, the LLM reformulation happens every single query, so you're paying for N LLM calls per retrieval. Some developers forget this and are shocked by their token bills.

Error recovery

ImportError: cannot import name 'MultiQueryRetriever'

MultiQueryRetriever was moved to langchain_community in v1.0. Install langchain-community: pip install langchain-community>=0.2.0

ValueError: 'retriever' is not defined

You must pass a base retriever to MultiQueryRetriever.from_llm(). Create one first: base_retriever = vectorstore.as_retriever() or use any Retriever subclass.

TypeError: invoke() missing required positional argument 'input'

Chain.invoke() requires a string directly for MultiQueryRetriever, not a dict. Use retriever.invoke('your query') not retriever.invoke({'query': 'your query'})

Experienced dev note

In production, you often want to customize the LLM instructions for generating variations. MultiQueryRetriever uses a default prompt, but you can pass your own via prompt parameter: this lets you enforce domain-specific reformulation rules. Also, consider wrapping it in a caching layer (Redis, SQLite) for common queries; regenerating variations for identical queries is wasteful. Finally, monitor your base_retriever's search_kwargs: if you set k=2 on base but MultiQueryRetriever calls it 5 times, you're retrieving up to 10 documents before dedup. This compounds costs faster than you'd expect.

Check your understanding

If you set base_retriever with k=1 and MultiQueryRetriever generates 4 query variations internally, what is the maximum number of unique documents you could receive in the final results, and why is it not always that maximum?

Show answer hint

The maximum is 4 (one unique doc per variation), but you may get fewer if: (1) the same document ranks in the top-1 for multiple variations, or (2) your vectorstore has fewer than 4 total documents. The key insight is that MultiQueryRetriever deduplicates by document ID before returning, so even if a document appears in multiple variation results, it's only returned once.

VERSION MultiQueryRetriever location changed in langchain >= 1.0.0. Prior versions (< 1.0.0) imported from langchain.retrievers. Current versions require from langchain.retrievers.multi_query or from langchain_community.retrievers depending on your setup. Verify your installed version: pip show langchain | grep Version.

Once you've improved recall with MultiQueryRetriever, learn how to rank and filter those extra results using Contextual Compression to keep only the highest-relevance documents and reduce noise in your chain's context window.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.