How to beginner · 3 min read

How to use Haystack with OpenAI

Quick answer
Use the haystack library with OpenAIGenerator to integrate OpenAI models for generating answers in your pipelines. Initialize OpenAIGenerator with your OpenAI API key and model like gpt-4o-mini, then build a Pipeline combining retrievers and generators for end-to-end AI search and QA.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install haystack openai

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install Haystack and OpenAI Python SDK: pip install haystack openai
  • Export your OpenAI API key in your shell environment: export OPENAI_API_KEY='your_api_key_here'
bash
pip install haystack openai

Step by step

This example shows how to create a simple Haystack pipeline using OpenAIGenerator with the gpt-4o-mini model to answer questions from documents stored in an in-memory document store.

python
import os
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever
from haystack.components.generators import OpenAIGenerator

# Initialize document store
document_store = InMemoryDocumentStore()

# Add sample documents
docs = [
    {"content": "OpenAI provides powerful AI models for natural language processing."},
    {"content": "Haystack is a framework to build search and QA systems."}
]
document_store.write_documents(docs)

# Initialize retriever
retriever = BM25Retriever(document_store=document_store)

# Initialize OpenAI generator with environment API key
generator = OpenAIGenerator(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o-mini")

# Build pipeline
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=generator, name="Generator", inputs=["Retriever"])

# Query the pipeline
query = "What does OpenAI provide?"
result = pipeline.run(query=query)

print("Answer:", result["answers"][0].answer)
output
Answer: OpenAI provides powerful AI models for natural language processing.

Common variations

You can customize the pipeline by using different retrievers like DensePassageRetriever or different OpenAI models such as gpt-4o. For async usage, Haystack supports async pipelines but requires async-compatible retrievers and generators. You can also enable streaming responses if supported by the generator.

python
from haystack.components.generators import OpenAIGenerator

generator = OpenAIGenerator(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o")
# Use a different retriever or add filters as needed

# Async example (simplified)
# async def async_query(pipeline, query):
#     result = await pipeline.run_async(query=query)
#     print(result)

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If no answers are returned, ensure your documents are properly indexed in the document store.
  • For rate limits, consider using smaller models like gpt-4o-mini or batching queries.

Key Takeaways

  • Use OpenAIGenerator in Haystack to leverage OpenAI models for answer generation.
  • Combine retrievers like BM25Retriever with generators for effective QA pipelines.
  • Always set your OpenAI API key in os.environ to authenticate requests.
  • Customize models and retrievers to balance cost, speed, and accuracy.
  • Check document indexing and API key setup if you encounter errors.
Verified 2026-04 · gpt-4o-mini, gpt-4o
Verify ↗