How to use Weaviate with Haystack
Quick answer
Use
WeaviateDocumentStore from haystack.document_stores to connect Haystack with a Weaviate instance. Index your documents and then use Haystack retrievers and generators to perform semantic search leveraging Weaviate's vector capabilities.PREREQUISITES
Python 3.8+Weaviate instance running (local or cloud)pip install haystack-ai weaviate-clientOpenAI API key or other embedding model API key
Setup
Install the necessary Python packages and ensure you have a running Weaviate instance. You also need an embedding model API key (e.g., OpenAI) for vectorizing documents.
pip install farm-haystack[weaviate] weaviate-client openai Step by step
This example demonstrates connecting Haystack to Weaviate, indexing documents, and performing a semantic search query.
import os
from haystack.document_stores import WeaviateDocumentStore
from haystack.nodes import EmbeddingRetriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline
# Configure Weaviate connection
weaviate_url = "http://localhost:8080" # Change if using cloud
# Initialize WeaviateDocumentStore
document_store = WeaviateDocumentStore(
url=weaviate_url,
index="haystack-weaviate-index",
embedding_dim=1536, # dimension for OpenAI embeddings
similarity="cosine",
create_schema=True
)
# Initialize retriever with OpenAI embeddings
retriever = EmbeddingRetriever(
document_store=document_store,
embedding_model="text-embedding-3-small",
api_key=os.environ["OPENAI_API_KEY"]
)
# Sample documents to index
docs = [
{"content": "Haystack is an open source NLP framework.", "meta": {"source": "wiki"}},
{"content": "Weaviate is a vector search engine.", "meta": {"source": "wiki"}}
]
# Write documents to Weaviate
document_store.write_documents(docs)
# Update embeddings in Weaviate
document_store.update_embeddings(retriever)
# Perform a semantic search
query = "What is Haystack?"
retrieved_docs = retriever.retrieve(query)
print("Top document:", retrieved_docs[0].content) output
Top document: Haystack is an open source NLP framework.
Common variations
- Use different embedding models by changing
embedding_modelinEmbeddingRetriever. - Use
FARMReaderor other readers for extractive QA pipelines. - Connect to a cloud-hosted Weaviate by changing the
urland adding authentication parameters. - Use async versions of Haystack components if needed.
Troubleshooting
- If documents do not appear in Weaviate, verify the
urland network connectivity. - Ensure the embedding dimension matches the model used.
- If
update_embeddingsfails, check your API key and internet connection. - For schema conflicts, delete the existing Weaviate index or use a new index name.
Key Takeaways
- Use
WeaviateDocumentStoreto integrate Weaviate with Haystack for vector search. - Index documents and update embeddings before querying for best results.
- Adjust embedding model and Weaviate connection settings for your environment.