How to beginner · 3 min read

How to use InMemoryDocumentStore in Haystack

Quick answer
Use InMemoryDocumentStore from haystack.document_stores.in_memory to store and retrieve documents entirely in memory for fast prototyping and testing. It supports indexing, querying, and integration with retrievers and generators in Haystack.

PREREQUISITES

  • Python 3.8+
  • pip install haystack-ai>=2.0
  • Basic knowledge of Python and document retrieval concepts

Setup

Install the latest haystack-ai package (version 2 or higher) which includes InMemoryDocumentStore. Ensure you have Python 3.8 or newer.

bash
pip install haystack-ai

Step by step

This example shows how to create an InMemoryDocumentStore, write documents, and query them using a retriever and a generator.

python
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, OpenAIGenerator
from haystack import Pipeline
import os

# Initialize the in-memory document store
document_store = InMemoryDocumentStore()

# Write documents to the store
docs = [
    {"content": "Haystack is an open source NLP framework.", "meta": {"source": "wiki"}},
    {"content": "InMemoryDocumentStore stores documents in RAM.", "meta": {"source": "docs"}}
]
document_store.write_documents(docs)

# Initialize a retriever
retriever = BM25Retriever(document_store=document_store)

# Initialize a generator (OpenAI GPT-4o model)
client_api_key = os.environ.get("OPENAI_API_KEY")
if not client_api_key:
    raise ValueError("Set OPENAI_API_KEY environment variable")
generator = OpenAIGenerator(api_key=client_api_key, model="gpt-4o")

# Build a pipeline combining retriever and generator
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])

# Query the pipeline
query = "What is Haystack?"
result = pipe.run(query=query)

# Print retrieved documents and generated answer
print("Retrieved documents:")
for doc in result["documents"]:
    print(f"- {doc.content} (source: {doc.meta.get('source')})")

print("\nGenerated answer:")
print(result["answers"][0].answer)
output
Retrieved documents:
- Haystack is an open source NLP framework. (source: wiki)
- InMemoryDocumentStore stores documents in RAM. (source: docs)

Generated answer:
Haystack is an open source NLP framework that enables building search systems and question answering pipelines.

Common variations

  • Use InMemoryDocumentStore for fast prototyping without external dependencies.
  • Switch to FAISSDocumentStore or ElasticsearchDocumentStore for production with persistent storage.
  • Use different retrievers like DensePassageRetriever or EmbeddingRetriever depending on your use case.
  • Replace OpenAIGenerator with other generators like OpenAIChatGenerator or HuggingFaceGenerator.

Troubleshooting

  • If you get ValueError about missing OPENAI_API_KEY, set it in your environment before running.
  • If documents are not found, ensure you called write_documents() before querying.
  • For large datasets, InMemoryDocumentStore may consume too much RAM; consider persistent stores.

Key Takeaways

  • Use InMemoryDocumentStore for fast, ephemeral document storage in Haystack.
  • Combine it with retrievers like BM25Retriever and generators like OpenAIGenerator for full QA pipelines.
  • Always write documents to the store before querying to avoid empty results.
  • For production, prefer persistent document stores over in-memory to avoid data loss.
  • Set environment variables like OPENAI_API_KEY to enable generator components.
Verified 2026-04 · gpt-4o, BM25Retriever, InMemoryDocumentStore, OpenAIGenerator
Verify ↗