MultiQueryRetriever: generating query variations to improve recall
Why this matters
Single-query retrieval often misses relevant documents due to lexical mismatch or phrasing differences. Users and LLMs phrase things differently than your indexed documents. MultiQueryRetriever solves this by having an LLM generate 3-5 variations of the original query and merging the results, significantly improving recall without sacrificing precision.
Explanation
What it is: MultiQueryRetriever wraps a base retriever and an LLM. When you query it, the LLM generates 3-5 reformulated versions of your question, then each variation is passed to the underlying retriever, and the results are deduplicated and merged.
How it works mechanically: The retriever creates a prompt that instructs an LLM to generate variations like 'Rephrase this question in X different ways.' For each variation, it calls your vector store or BM25 retriever independently, collects all results, removes duplicates (by document ID), and returns the union. The LLM never sees the documents themselves: it only reformulates the question.
When to use it: Use MultiQueryRetriever when your application serves users with natural language queries that don't align with your indexed document phrasing, or when precision isn't critical and you want to maximize recall. It's especially valuable for open-domain QA systems, customer support bots, and search applications where missing a relevant document is worse than returning extra candidates.
Analogy
It's like asking the same question to five different people in different ways to make sure you don't miss what you're looking for. One person might understand 'vehicle' better, another understands 'car,' another 'automobile': so you ask in all three phrasings and collect all their answers.
Code
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.documents import Document
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
docs = [
Document(page_content="Python is a high-level programming language known for its simplicity and readability.", metadata={"source": "doc1"}),
Document(page_content="Machine learning models require large amounts of training data to achieve high accuracy.", metadata={"source": "doc2"}),
Document(page_content="Vector databases store embeddings for fast semantic search capabilities.", metadata={"source": "doc3"}),
Document(page_content="Deep learning uses neural networks with multiple layers to process data.", metadata={"source": "doc4"}),
Document(page_content="Natural language processing enables computers to understand human text.", metadata={"source": "doc5"}),
]
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(docs, embeddings)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = MultiQueryRetriever.from_llm(
retriever=base_retriever,
llm=llm
)
query = "What programming languages are good for AI?"
results = retriever.invoke(query)
print(f"Query: {query}")
print(f"\nNumber of unique documents retrieved: {len(results)}")
print("\nRetrieved documents:")
for i, doc in enumerate(results, 1):
print(f"{i}. {doc.page_content[:70]}... (source: {doc.metadata['source']})") Query: What programming languages are good for AI? Number of unique documents retrieved: 4 Retrieved documents: 1. Python is a high-level programming language known for its simp... (source: doc1) 2. Deep learning uses neural networks with multiple layers to pro... (source: doc4) 3. Machine learning models require large amounts of training data... (source: doc2) 4. Natural language processing enables computers to understand hu... (source: doc5)
What just happened?
The code created a FAISS vector store from 5 documents, wrapped it with MultiQueryRetriever using gpt-4o-mini as the reformulation LLM, then queried it with 'What programming languages are good for AI?'. Internally, the LLM generated variations like 'Which programming languages are used for machine learning?', 'Best languages for deep learning', etc. Each variation was passed to the base retriever (k=2), but MultiQueryRetriever merged all results and deduped by document ID, returning 4 unique documents instead of 2.
Common gotcha
The biggest mistake is assuming MultiQueryRetriever guarantees better results: it doesn't. If your LLM generates poor reformulations (too similar to the original, or semantic drift), you may retrieve irrelevant documents and actually hurt precision. Also, the LLM reformulation happens every single query, so you're paying for N LLM calls per retrieval. Some developers forget this and are shocked by their token bills.
Error recovery
ImportError: cannot import name 'MultiQueryRetriever'ValueError: 'retriever' is not definedTypeError: invoke() missing required positional argument 'input'Experienced dev note
In production, you often want to customize the LLM instructions for generating variations. MultiQueryRetriever uses a default prompt, but you can pass your own via prompt parameter: this lets you enforce domain-specific reformulation rules. Also, consider wrapping it in a caching layer (Redis, SQLite) for common queries; regenerating variations for identical queries is wasteful. Finally, monitor your base_retriever's search_kwargs: if you set k=2 on base but MultiQueryRetriever calls it 5 times, you're retrieving up to 10 documents before dedup. This compounds costs faster than you'd expect.
Check your understanding
If you set base_retriever with k=1 and MultiQueryRetriever generates 4 query variations internally, what is the maximum number of unique documents you could receive in the final results, and why is it not always that maximum?
Show answer hint
The maximum is 4 (one unique doc per variation), but you may get fewer if: (1) the same document ranks in the top-1 for multiple variations, or (2) your vectorstore has fewer than 4 total documents. The key insight is that MultiQueryRetriever deduplicates by document ID before returning, so even if a document appears in multiple variation results, it's only returned once.