How to beginner · 3 min read

How to build QA system with Haystack

Quick answer
Use haystack v2 with OpenAIGenerator and InMemoryBM25Retriever to build a QA system. Load documents into InMemoryDocumentStore, then create a Pipeline combining retriever and generator to answer questions.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install haystack-ai openai

Setup

Install the haystack-ai package and set your OpenAI API key as an environment variable.

  • Run pip install haystack-ai openai
  • Set OPENAI_API_KEY in your environment
bash
pip install haystack-ai openai

Step by step

This example loads sample documents, sets up an in-memory document store, a BM25 retriever, and an OpenAI GPT-4o generator. It then creates a Pipeline to answer questions.

python
import os
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, OpenAIAnswerGenerator

# Ensure your OpenAI API key is set in environment
# export OPENAI_API_KEY="your_api_key"

# Sample documents
docs = [
    {"content": "Python is a programming language."},
    {"content": "Haystack is an open-source NLP framework."},
    {"content": "OpenAI provides powerful language models like GPT-4o."}
]

# Initialize document store and write docs
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

# Initialize BM25 retriever
retriever = BM25Retriever(document_store=document_store)

# Initialize OpenAI generator
generator = OpenAIAnswerGenerator(api_key=os.environ["OPENAI_API_KEY"], model_name="gpt-4o")

# Create pipeline with retriever and generator
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])

# Ask a question
query = "What is Haystack?"
result = pipe.run(query=query)

print("Answer:", result["answers"][0].answer)
output
Answer: Haystack is an open-source NLP framework.

Common variations

  • Use different retrievers like DensePassageRetriever for semantic search.
  • Switch generator model to gpt-4o-mini for faster, cheaper responses.
  • Use async pipelines for high throughput.
  • Integrate with other document stores like FAISSDocumentStore for large-scale data.

Troubleshooting

  • If you get authentication errors, verify OPENAI_API_KEY is correctly set.
  • If no answers are returned, ensure documents are loaded properly into the document store.
  • For slow responses, try smaller models like gpt-4o-mini.

Key Takeaways

  • Use InMemoryDocumentStore and BM25Retriever for simple QA setups.
  • Combine retriever and generator in a Pipeline for end-to-end question answering.
  • Set OPENAI_API_KEY in environment to authenticate with OpenAI models.
  • Switch models or retrievers to optimize cost, speed, or accuracy.
  • Check document loading and API key if answers are missing or errors occur.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗