How to beginner · 3 min read

How to use InMemoryDocumentStore in Haystack

Quick answer

Use InMemoryDocumentStore from haystack.document_stores.in_memory to store and retrieve documents entirely in memory for fast prototyping and testing. It supports indexing, querying, and integration with retrievers and generators in Haystack.

PREREQUISITES

Python 3.8+
pip install haystack-ai>=2.0
Basic knowledge of Python and document retrieval concepts

Setup

Install the latest haystack-ai package (version 2 or higher) which includes InMemoryDocumentStore. Ensure you have Python 3.8 or newer.

bash

pip install haystack-ai

Step by step

This example shows how to create an InMemoryDocumentStore, write documents, and query them using a retriever and a generator.

python

from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, OpenAIGenerator
from haystack import Pipeline
import os

# Initialize the in-memory document store
document_store = InMemoryDocumentStore()

# Write documents to the store
docs = [
    {"content": "Haystack is an open source NLP framework.", "meta": {"source": "wiki"}},
    {"content": "InMemoryDocumentStore stores documents in RAM.", "meta": {"source": "docs"}}
]
document_store.write_documents(docs)

# Initialize a retriever
retriever = BM25Retriever(document_store=document_store)

# Initialize a generator (OpenAI GPT-4o model)
client_api_key = os.environ.get("OPENAI_API_KEY")
if not client_api_key:
    raise ValueError("Set OPENAI_API_KEY environment variable")
generator = OpenAIGenerator(api_key=client_api_key, model="gpt-4o")

# Build a pipeline combining retriever and generator
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])

# Query the pipeline
query = "What is Haystack?"
result = pipe.run(query=query)

# Print retrieved documents and generated answer
print("Retrieved documents:")
for doc in result["documents"]:
    print(f"- {doc.content} (source: {doc.meta.get('source')})")

print("\nGenerated answer:")
print(result["answers"][0].answer)

output

Retrieved documents:
- Haystack is an open source NLP framework. (source: wiki)
- InMemoryDocumentStore stores documents in RAM. (source: docs)

Generated answer:
Haystack is an open source NLP framework that enables building search systems and question answering pipelines.

Common variations

Use InMemoryDocumentStore for fast prototyping without external dependencies.
Switch to FAISSDocumentStore or ElasticsearchDocumentStore for production with persistent storage.
Use different retrievers like DensePassageRetriever or EmbeddingRetriever depending on your use case.
Replace OpenAIGenerator with other generators like OpenAIChatGenerator or HuggingFaceGenerator.

Troubleshooting

If you get ValueError about missing OPENAI_API_KEY, set it in your environment before running.
If documents are not found, ensure you called write_documents() before querying.
For large datasets, InMemoryDocumentStore may consume too much RAM; consider persistent stores.

Key Takeaways

Use InMemoryDocumentStore for fast, ephemeral document storage in Haystack.
Combine it with retrievers like BM25Retriever and generators like OpenAIGenerator for full QA pipelines.
Always write documents to the store before querying to avoid empty results.
For production, prefer persistent document stores over in-memory to avoid data loss.
Set environment variables like OPENAI_API_KEY to enable generator components.

Verified 2026-04 · gpt-4o, BM25Retriever, InMemoryDocumentStore, OpenAIGenerator

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.