How to build QA system with Haystack
Quick answer
Use
haystack v2 with OpenAIGenerator and InMemoryBM25Retriever to build a QA system. Load documents into InMemoryDocumentStore, then create a Pipeline combining retriever and generator to answer questions.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install haystack-ai openai
Setup
Install the haystack-ai package and set your OpenAI API key as an environment variable.
- Run
pip install haystack-ai openai - Set
OPENAI_API_KEYin your environment
pip install haystack-ai openai Step by step
This example loads sample documents, sets up an in-memory document store, a BM25 retriever, and an OpenAI GPT-4o generator. It then creates a Pipeline to answer questions.
import os
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, OpenAIAnswerGenerator
# Ensure your OpenAI API key is set in environment
# export OPENAI_API_KEY="your_api_key"
# Sample documents
docs = [
{"content": "Python is a programming language."},
{"content": "Haystack is an open-source NLP framework."},
{"content": "OpenAI provides powerful language models like GPT-4o."}
]
# Initialize document store and write docs
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
# Initialize BM25 retriever
retriever = BM25Retriever(document_store=document_store)
# Initialize OpenAI generator
generator = OpenAIAnswerGenerator(api_key=os.environ["OPENAI_API_KEY"], model_name="gpt-4o")
# Create pipeline with retriever and generator
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])
# Ask a question
query = "What is Haystack?"
result = pipe.run(query=query)
print("Answer:", result["answers"][0].answer) output
Answer: Haystack is an open-source NLP framework.
Common variations
- Use different retrievers like
DensePassageRetrieverfor semantic search. - Switch generator model to
gpt-4o-minifor faster, cheaper responses. - Use async pipelines for high throughput.
- Integrate with other document stores like
FAISSDocumentStorefor large-scale data.
Troubleshooting
- If you get authentication errors, verify
OPENAI_API_KEYis correctly set. - If no answers are returned, ensure documents are loaded properly into the document store.
- For slow responses, try smaller models like
gpt-4o-mini.
Key Takeaways
- Use
InMemoryDocumentStoreandBM25Retrieverfor simple QA setups. - Combine retriever and generator in a
Pipelinefor end-to-end question answering. - Set
OPENAI_API_KEYin environment to authenticate with OpenAI models. - Switch models or retrievers to optimize cost, speed, or accuracy.
- Check document loading and API key if answers are missing or errors occur.