How to beginner · 3 min read

Dense retrieval in Haystack explained

Quick answer
Dense retrieval in Haystack uses vector embeddings to semantically match queries with documents, enabling more accurate search than keyword matching. It typically involves embedding documents and queries with models like OpenAIEmbeddings and indexing them in vector stores such as FAISS for fast similarity search.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install haystack-ai openai faiss-cpu

Setup

Install haystack-ai along with openai and faiss-cpu for vector indexing. Set your OpenAI API key as an environment variable.

  • Install packages: pip install haystack-ai openai faiss-cpu
  • Export your API key: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or set in your environment variables on Windows.
bash
pip install haystack-ai openai faiss-cpu

Step by step

This example shows how to create a dense retrieval pipeline in Haystack using OpenAIEmbeddings and FAISS as the vector store. It loads documents, embeds them, indexes them, and performs a semantic search query.

python
import os
from haystack import Pipeline
from haystack_community.document_loaders import TextLoader
from haystack_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Load documents
loader = TextLoader("example_docs.txt")
docs = loader.load()

# Initialize embeddings with OpenAI
embeddings = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])

# Create FAISS vector store and index documents
vectorstore = FAISS.from_documents(docs, embeddings)

# Build retrieval pipeline
pipeline = Pipeline()
pipeline.add_node(component=vectorstore.as_retriever(), name="Retriever", inputs=["Query"])

# Query the pipeline
query = "What is dense retrieval?"
result = pipeline.run(query=query)

print("Top documents:")
for doc in result["documents"]:
    print(f"- {doc.content[:200]}...")
output
Top documents:
- Dense retrieval is a technique that uses dense vector embeddings to represent documents and queries, enabling semantic search...
- Unlike sparse retrieval, dense retrieval captures semantic similarity by embedding text into continuous vector spaces...

Common variations

You can customize dense retrieval in Haystack by:

  • Using different embedding models like sentence-transformers or OpenAI variants.
  • Switching vector stores to Chroma, Weaviate, or Pinecone for scalability.
  • Implementing asynchronous queries or streaming results in advanced pipelines.

Troubleshooting

If you see empty search results, ensure your documents are properly loaded and embeddings are generated without errors. Check your OPENAI_API_KEY environment variable is set correctly. For indexing issues, verify FAISS is installed and compatible with your system architecture.

Key Takeaways

  • Dense retrieval uses vector embeddings for semantic search, outperforming keyword matching.
  • Haystack integrates OpenAI embeddings with FAISS for efficient dense retrieval pipelines.
  • You can swap embedding models and vector stores to fit your scalability and accuracy needs.
Verified 2026-04 · OpenAIEmbeddings, FAISS
Verify ↗