How to beginner · 3 min read

How to use Haystack Retriever

Q: How to use Haystack Retriever

Use the InMemoryDocumentStore to store documents and the InMemoryBM25Retriever to retrieve relevant documents based on queries. Initialize the retriever with the document store, then call retrieve(query) to get top matching documents.

Quick answer

Use the InMemoryDocumentStore to store documents and the InMemoryBM25Retriever to retrieve relevant documents based on queries. Initialize the retriever with the document store, then call retrieve(query) to get top matching documents.

PREREQUISITES

Python 3.8+
pip install haystack-ai openai
OpenAI API key (free tier works)
Set environment variable OPENAI_API_KEY

Setup

Install the haystack-ai package (version 2+) and set your OpenAI API key as an environment variable.

Run pip install haystack-ai openai
Export your API key: export OPENAI_API_KEY='your_api_key' on Linux/macOS or set it in your environment on Windows.

bash

pip install haystack-ai openai

Step by step

This example shows how to create an in-memory document store, add documents, initialize the InMemoryBM25Retriever, and retrieve documents relevant to a query.

python

import os
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

# Initialize document store
document_store = InMemoryDocumentStore()

# Add sample documents
docs = [
    {"content": "Haystack is an open source NLP framework for building search systems."},
    {"content": "OpenAI provides powerful language models like GPT-4o-mini."},
    {"content": "Retrievers help find relevant documents quickly."}
]
document_store.write_documents(docs)

# Initialize BM25 retriever
retriever = InMemoryBM25Retriever(document_store=document_store)

# Retrieve documents for a query
query = "What is Haystack?"
retrieved_docs = retriever.retrieve(query)

# Print retrieved documents
for i, doc in enumerate(retrieved_docs, 1):
    print(f"Document {i}: {doc.content}")

output

Document 1: Haystack is an open source NLP framework for building search systems.
Document 2: Retrievers help find relevant documents quickly.
Document 3: OpenAI provides powerful language models like GPT-4o-mini.

Common variations

You can use different retriever types like DensePassageRetriever for embedding-based search or integrate with external document stores like Elasticsearch. For async usage, Haystack supports async pipelines. You can also combine retrievers with generators for question answering.

python

from haystack.components.retrievers.dense import DensePassageRetriever

# Example: Initialize DensePassageRetriever (requires FAISS or Elasticsearch)
dpr = DensePassageRetriever(document_store=document_store)

# Async example (simplified)
import asyncio

async def async_retrieve():
    docs = await retriever.aretrieve("What is Haystack?")
    for doc in docs:
        print(doc.content)

asyncio.run(async_retrieve())

Troubleshooting

If you see ModuleNotFoundError, ensure you installed haystack-ai version 2 or higher.
If retrieval returns empty, verify documents are written to the document store before querying.
For API key errors, confirm OPENAI_API_KEY is set correctly in your environment.

Key Takeaways

Use InMemoryDocumentStore and InMemoryBM25Retriever for simple local document retrieval.
Add documents to the store before querying to get relevant results.
Haystack supports multiple retriever types and async usage for flexible search pipelines.

Verified 2026-04 · gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.