How to beginner · 3 min read

Haystack pipeline explained

Quick answer
A Haystack pipeline is a modular workflow that connects document stores, retrievers, and generators to perform tasks like question answering. It orchestrates components such as InMemoryDocumentStore, BM25Retriever, and OpenAIGenerator to retrieve relevant documents and generate answers from them.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install haystack-ai openai

Setup

Install the haystack-ai package and set your OpenAI API key as an environment variable.

  • Install Haystack and OpenAI SDK:
bash
pip install haystack-ai openai

Step by step

This example creates an in-memory document store, adds documents, sets up a BM25 retriever, and uses the OpenAI generator to answer a query.

python
import os
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator

# Set your OpenAI API key in environment variable before running
# export OPENAI_API_KEY="your_api_key"

# Initialize document store
document_store = InMemoryDocumentStore()

# Write sample documents
docs = [
    {"content": "Haystack is an open-source NLP framework for building search systems."},
    {"content": "It supports retrievers and generators for question answering."}
]
document_store.write_documents(docs)

# Initialize retriever
retriever = InMemoryBM25Retriever(document_store=document_store)

# Initialize generator with OpenAI GPT-4o-mini
generator = OpenAIGenerator(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o-mini")

# Build pipeline
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=generator, name="Generator", inputs=["Retriever"])

# Run pipeline
query = "What is Haystack?"
result = pipeline.run(query=query)

print("Answer:", result["answers"][0].answer)
output
Answer: Haystack is an open-source NLP framework for building search systems that supports retrievers and generators for question answering.

Common variations

  • Use different retrievers like DensePassageRetriever for semantic search.
  • Replace OpenAIGenerator with other generators like OpenAIChatGenerator or TransformersGenerator.
  • Use external document stores such as FAISSDocumentStore or ElasticsearchDocumentStore for scalability.
  • Run pipelines asynchronously or stream results for real-time applications.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If no answers are returned, ensure documents are properly written to the document store.
  • For slow responses, consider using smaller models or caching retriever results.

Key Takeaways

  • Haystack pipelines connect retrievers and generators to build powerful QA systems.
  • Use InMemoryDocumentStore and BM25Retriever for simple setups.
  • OpenAI models like gpt-4o-mini can be used as generators in Haystack.
  • Switch components easily for semantic search or scalable document storage.
  • Always set your API keys via environment variables to avoid authentication issues.
Verified 2026-04 · gpt-4o-mini
Verify ↗