How to use Haystack Retriever
Quick answer
Use the
InMemoryDocumentStore to store documents and the InMemoryBM25Retriever to retrieve relevant documents based on queries. Initialize the retriever with the document store, then call retrieve(query) to get top matching documents.PREREQUISITES
Python 3.8+pip install haystack-ai openaiOpenAI API key (free tier works)Set environment variable OPENAI_API_KEY
Setup
Install the haystack-ai package (version 2+) and set your OpenAI API key as an environment variable.
- Run
pip install haystack-ai openai - Export your API key:
export OPENAI_API_KEY='your_api_key'on Linux/macOS or set it in your environment on Windows.
pip install haystack-ai openai Step by step
This example shows how to create an in-memory document store, add documents, initialize the InMemoryBM25Retriever, and retrieve documents relevant to a query.
import os
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
# Initialize document store
document_store = InMemoryDocumentStore()
# Add sample documents
docs = [
{"content": "Haystack is an open source NLP framework for building search systems."},
{"content": "OpenAI provides powerful language models like GPT-4o-mini."},
{"content": "Retrievers help find relevant documents quickly."}
]
document_store.write_documents(docs)
# Initialize BM25 retriever
retriever = InMemoryBM25Retriever(document_store=document_store)
# Retrieve documents for a query
query = "What is Haystack?"
retrieved_docs = retriever.retrieve(query)
# Print retrieved documents
for i, doc in enumerate(retrieved_docs, 1):
print(f"Document {i}: {doc.content}") output
Document 1: Haystack is an open source NLP framework for building search systems. Document 2: Retrievers help find relevant documents quickly. Document 3: OpenAI provides powerful language models like GPT-4o-mini.
Common variations
You can use different retriever types like DensePassageRetriever for embedding-based search or integrate with external document stores like Elasticsearch. For async usage, Haystack supports async pipelines. You can also combine retrievers with generators for question answering.
from haystack.components.retrievers.dense import DensePassageRetriever
# Example: Initialize DensePassageRetriever (requires FAISS or Elasticsearch)
dpr = DensePassageRetriever(document_store=document_store)
# Async example (simplified)
import asyncio
async def async_retrieve():
docs = await retriever.aretrieve("What is Haystack?")
for doc in docs:
print(doc.content)
asyncio.run(async_retrieve()) Troubleshooting
- If you see
ModuleNotFoundError, ensure you installedhaystack-aiversion 2 or higher. - If retrieval returns empty, verify documents are written to the document store before querying.
- For API key errors, confirm
OPENAI_API_KEYis set correctly in your environment.
Key Takeaways
- Use
InMemoryDocumentStoreandInMemoryBM25Retrieverfor simple local document retrieval. - Add documents to the store before querying to get relevant results.
- Haystack supports multiple retriever types and async usage for flexible search pipelines.