How to create custom Haystack components
Quick answer
Create custom Haystack components by subclassing
BaseComponent or relevant base classes like BaseRetriever or BaseGenerator. Implement required methods such as run() and run_batch() to define your component's behavior, then integrate it into your Pipeline.PREREQUISITES
Python 3.8+pip install haystack-ai>=2.0Basic knowledge of Python classes and inheritance
Setup
Install the latest Haystack AI package (v2 or higher) to access the new component system.
pip install haystack-ai Step by step
Subclass BaseComponent and implement run() and optionally run_batch(). Register your component in a Pipeline to use it.
from haystack import Pipeline
from haystack.nodes import BaseComponent
class CustomComponent(BaseComponent):
def run(self, query, **kwargs):
# Custom logic here
result = f"Processed query: {query}"
return {"result": result}, "output_1"
def run_batch(self, queries, **kwargs):
results = [f"Processed query: {q}" for q in queries]
return {"results": results}, ["output_1"] * len(queries)
# Create pipeline and add custom component
pipeline = Pipeline()
pipeline.add_node(component=CustomComponent(), name="CustomComponent", inputs=["Query"])
# Run pipeline
output = pipeline.run(query="Hello Haystack")
print(output["result"]) output
Processed query: Hello Haystack
Common variations
- Implement
run_batch()for batch processing. - Subclass specialized base classes like
BaseRetrieverorBaseGeneratorfor retrievers or generators. - Use
Pipeline.add_node()to chain multiple custom components.
from haystack import Pipeline
from haystack.nodes import BaseRetriever
class CustomRetriever(BaseRetriever):
def run(self, query, **kwargs):
# Return dummy documents
docs = [{"content": f"Doc for {query}", "id": "1"}]
return {"documents": docs}, "output_1"
pipeline = Pipeline()
pipeline.add_node(component=CustomRetriever(), name="Retriever", inputs=["Query"])
output = pipeline.run(query="Find docs")
print(output["documents"]) output
[{'content': 'Doc for Find docs', 'id': '1'}] Troubleshooting
- If your component is not called, ensure you added it to the pipeline with correct
inputs. - Check that
run()returns a tuple of (dict, str) with output name. - For batch processing, verify
run_batch()returns (dict, list_of_str) matching batch size.
Key Takeaways
- Subclass Haystack's
BaseComponentand implementrun()to create custom components. - Use
Pipeline.add_node()to integrate your custom component into Haystack pipelines. - Implement
run_batch()for efficient batch processing support. - Specialized base classes like
BaseRetrieversimplify building retrievers or generators. - Always return a tuple of (dict, output_name) from
run()to comply with Haystack's pipeline protocol.