How to beginner · 3 min read

How to build QA system with Haystack

Q: How to build QA system with Haystack

Use haystack v2 with OpenAIGenerator and InMemoryBM25Retriever to build a QA system. Load documents into InMemoryDocumentStore, then create a Pipeline combining retriever and generator to answer questions.

Quick answer

Use haystack v2 with OpenAIGenerator and InMemoryBM25Retriever to build a QA system. Load documents into InMemoryDocumentStore, then create a Pipeline combining retriever and generator to answer questions.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install haystack-ai openai

Setup

Install the haystack-ai package and set your OpenAI API key as an environment variable.

Run pip install haystack-ai openai
Set OPENAI_API_KEY in your environment

bash

pip install haystack-ai openai

Step by step

This example loads sample documents, sets up an in-memory document store, a BM25 retriever, and an OpenAI GPT-4o generator. It then creates a Pipeline to answer questions.

python

import os
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, OpenAIAnswerGenerator

# Ensure your OpenAI API key is set in environment
# export OPENAI_API_KEY="your_api_key"

# Sample documents
docs = [
    {"content": "Python is a programming language."},
    {"content": "Haystack is an open-source NLP framework."},
    {"content": "OpenAI provides powerful language models like GPT-4o."}
]

# Initialize document store and write docs
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

# Initialize BM25 retriever
retriever = BM25Retriever(document_store=document_store)

# Initialize OpenAI generator
generator = OpenAIAnswerGenerator(api_key=os.environ["OPENAI_API_KEY"], model_name="gpt-4o")

# Create pipeline with retriever and generator
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])

# Ask a question
query = "What is Haystack?"
result = pipe.run(query=query)

print("Answer:", result["answers"][0].answer)

output

Answer: Haystack is an open-source NLP framework.

Common variations

Use different retrievers like DensePassageRetriever for semantic search.
Switch generator model to gpt-4o-mini for faster, cheaper responses.
Use async pipelines for high throughput.
Integrate with other document stores like FAISSDocumentStore for large-scale data.

Troubleshooting

If you get authentication errors, verify OPENAI_API_KEY is correctly set.
If no answers are returned, ensure documents are loaded properly into the document store.
For slow responses, try smaller models like gpt-4o-mini.

✅

Key Takeaways

Use InMemoryDocumentStore and BM25Retriever for simple QA setups.
Combine retriever and generator in a Pipeline for end-to-end question answering.
Set OPENAI_API_KEY in environment to authenticate with OpenAI models.
Switch models or retrievers to optimize cost, speed, or accuracy.
Check document loading and API key if answers are missing or errors occur.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗