How to use Haystack with local models
Quick answer
Use
haystack v2 with local document stores and local model wrappers like transformers pipelines or sentence-transformers embeddings. Load your local model, create a retriever and generator, then build a Pipeline to query documents without cloud dependencies.PREREQUISITES
Python 3.8+pip install haystack-ai>=2.0pip install transformers sentence-transformers torchLocal AI model files or access to Hugging Face model hub
Setup
Install the necessary packages for Haystack v2 and local models. You need haystack-ai for the pipeline, transformers for local model inference, and sentence-transformers for embeddings.
Set up your environment with:
pip install haystack-ai transformers sentence-transformers torch Step by step
This example shows how to create a local document store, embed documents with a local sentence-transformers model, and use a local transformers text generation model for answers.
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import EmbeddingRetriever, TransformersGenerator
# Initialize local document store
document_store = InMemoryDocumentStore()
# Write sample documents
docs = [
{"content": "Haystack is an open-source NLP framework."},
{"content": "Local models can be used with Haystack for offline processing."}
]
document_store.write_documents(docs)
# Initialize local embedding retriever using sentence-transformers
retriever = EmbeddingRetriever(
document_store=document_store,
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
model_format="sentence_transformers"
)
# Update embeddings in the document store
document_store.update_embeddings(retriever)
# Initialize local text generator using transformers
generator = TransformersGenerator(model_name_or_path="gpt2", max_length=50)
# Build a pipeline with retriever and generator
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])
# Query the pipeline
query = "What is Haystack?"
result = pipe.run(query=query, params={"Retriever": {"top_k": 2}})
print("Answer:", result["answers"][0].answer) output
Answer: Haystack is an open-source NLP framework.
Common variations
- Use other local embedding models by changing
embedding_modelinEmbeddingRetriever. - Switch to a larger or fine-tuned local generation model by changing
model_name_or_pathinTransformersGenerator. - Use
FARMReaderorTransformersReaderfor extractive QA instead of generation. - Run asynchronously by integrating with async frameworks, though Haystack v2 primarily uses sync calls.
Troubleshooting
- If you see
CUDA out of memory, reduce batch sizes or switch to CPU by settingdevice="cpu"in model initializations. - For slow performance, ensure models are cached locally or use smaller models.
- If embeddings are not updating, confirm
document_store.update_embeddings()is called after retriever setup. - Check that all dependencies are compatible with Haystack v2 to avoid import errors.
Key Takeaways
- Use Haystack v2 with local document stores and local model wrappers for offline AI pipelines.
- Embed documents with sentence-transformers and generate answers with transformers models locally.
- Update document embeddings after adding documents to enable effective retrieval.
- Adjust model sizes and device settings to optimize performance and resource usage.