How to Intermediate · 4 min read

How to use ChromaDB with Haystack

Quick answer
Use Chroma from langchain_community.vectorstores as the vector store in a Haystack pipeline. Load documents, embed them with OpenAIEmbeddings, and connect Chroma to InMemoryRetriever or BaseRetriever for semantic search.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install haystack-ai langchain_community openai chromadb

Setup

Install the required packages and set your OpenAI API key in the environment variables.

bash
pip install haystack-ai langchain_community openai chromadb

Step by step

This example shows how to load documents, create embeddings with OpenAI, store them in ChromaDB, and query using Haystack's retriever.

python
import os
from haystack import Pipeline
from haystack.nodes import PromptNode, PromptTemplate
from haystack.nodes import BaseRetriever
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import EmbeddingRetriever

# Set your OpenAI API key in environment
# export OPENAI_API_KEY="your_api_key"

# Sample documents
documents = [
    {"content": "ChromaDB is a fast vector database for embeddings."},
    {"content": "Haystack is a framework for building search systems."},
    {"content": "OpenAI provides powerful embedding models."}
]

# Initialize InMemoryDocumentStore
document_store = InMemoryDocumentStore(use_bm25=False)
document_store.write_documents(documents)

# Create OpenAI embeddings client
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key=os.environ["OPENAI_API_KEY"])

# Create Chroma vector store from documents
texts = [doc["content"] for doc in documents]

# Initialize Chroma with texts and embeddings
chroma = Chroma.from_texts(
    texts=texts,
    embedding=embeddings,
    collection_name="haystack_chroma_collection"
)

# Create a retriever using Chroma
retriever = chroma.as_retriever(search_kwargs={"k": 2})

# Build a simple Haystack pipeline with retriever
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])

# Query the pipeline
query = "What is ChromaDB?"
result = pipeline.run(query=query)

print("Top documents:")
for doc in result["documents"]:
    print(f"- {doc.content}")
output
Top documents:
- ChromaDB is a fast vector database for embeddings.
- OpenAI provides powerful embedding models.

Common variations

  • Use EmbeddingRetriever from Haystack with Chroma as the vector store for more advanced retrieval options.
  • Switch to async calls by using async-compatible Haystack components and async OpenAI clients.
  • Use different embedding models like text-embedding-3-large or custom embeddings.

Troubleshooting

  • If you get ModuleNotFoundError for chromadb, ensure you installed chromadb package.
  • If embeddings are empty or retrieval returns no results, verify your OpenAI API key is set correctly in os.environ["OPENAI_API_KEY"].
  • For large document sets, consider persisting Chroma collections to disk to avoid re-indexing on every run.

Key Takeaways

  • Use Chroma from langchain_community.vectorstores as a vector store backend in Haystack pipelines.
  • Generate embeddings with OpenAIEmbeddings and store them in Chroma for fast semantic search.
  • Integrate Chroma retriever into Haystack's Pipeline for flexible query handling.
  • Ensure environment variables for API keys are set to avoid authentication errors.
  • Persist Chroma collections for scalability and faster startup in production.
Verified 2026-04 · text-embedding-3-small, gpt-4o
Verify ↗