How to build RAG pipeline with Haystack
Quick answer
Use
haystack to build a RAG pipeline by combining a retriever like FAISS with a generator such as OpenAIGenerator. Index your documents in a DocumentStore, query with the retriever, and generate answers with the generator in a Pipeline.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install haystack-ai openai faiss-cpu
Setup
Install the required packages and set your OpenAI API key as an environment variable.
pip install haystack-ai openai faiss-cpu Step by step
This example shows how to create a RAG pipeline with Haystack using InMemoryDocumentStore, FAISS retriever, and OpenAIGenerator. It indexes sample documents, then queries the pipeline to get a generated answer.
import os
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import FAISSRetriever, OpenAIGenerator
# Set your OpenAI API key in environment
# export OPENAI_API_KEY="your_api_key"
# Initialize document store
document_store = InMemoryDocumentStore()
# Sample documents to index
docs = [
{"content": "Python is a programming language."},
{"content": "Haystack is an open-source NLP framework."},
{"content": "RAG stands for Retrieval-Augmented Generation."}
]
# Write documents to the store
document_store.write_documents(docs)
# Initialize retriever with FAISS
retriever = FAISSRetriever(document_store=document_store)
# Initialize generator with OpenAI
generator = OpenAIGenerator(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o-mini")
# Build pipeline
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=generator, name="Generator", inputs=["Retriever"])
# Query the pipeline
query = "What does RAG mean?"
result = pipeline.run(query=query)
print("Generated answer:", result["answers"][0].answer) output
Generated answer: RAG stands for Retrieval-Augmented Generation, a technique that combines document retrieval with language generation.
Common variations
- Use
FAISSDocumentStorefor persistent FAISS indexes instead ofInMemoryDocumentStore. - Switch to other retrievers like
DPRRetrieverorBM25Retrieverbased on your use case. - Use different generators such as
OpenAIGeneratorwith other OpenAI models orHuggingFaceGeneratorfor local models. - Implement async querying by using Haystack's async pipeline methods.
Troubleshooting
- If you see
ImportErrorforfaiss, ensurefaiss-cpuis installed correctly. - If OpenAI API calls fail, verify your
OPENAI_API_KEYenvironment variable is set and valid. - For slow retrieval, consider using a persistent FAISS index or a more efficient retriever.
- If no answers are returned, check that documents are indexed properly and retriever is configured correctly.
Key Takeaways
- Use Haystack's
Pipelineto combine retriever and generator for RAG. - Index documents in a
DocumentStoreand retrieve withFAISSRetriever. - Generate answers with
OpenAIGeneratorusing your OpenAI API key. - Customize retrievers and generators based on your data and latency needs.
- Set environment variables correctly to avoid authentication and import errors.