How to Intermediate · 3 min read

How to create custom Haystack components

Quick answer
Create custom Haystack components by subclassing BaseComponent or relevant base classes like BaseRetriever or BaseGenerator. Implement required methods such as run() and run_batch() to define your component's behavior, then integrate it into your Pipeline.

PREREQUISITES

  • Python 3.8+
  • pip install haystack-ai>=2.0
  • Basic knowledge of Python classes and inheritance

Setup

Install the latest Haystack AI package (v2 or higher) to access the new component system.

bash
pip install haystack-ai

Step by step

Subclass BaseComponent and implement run() and optionally run_batch(). Register your component in a Pipeline to use it.

python
from haystack import Pipeline
from haystack.nodes import BaseComponent

class CustomComponent(BaseComponent):
    def run(self, query, **kwargs):
        # Custom logic here
        result = f"Processed query: {query}"
        return {"result": result}, "output_1"

    def run_batch(self, queries, **kwargs):
        results = [f"Processed query: {q}" for q in queries]
        return {"results": results}, ["output_1"] * len(queries)

# Create pipeline and add custom component
pipeline = Pipeline()
pipeline.add_node(component=CustomComponent(), name="CustomComponent", inputs=["Query"])

# Run pipeline
output = pipeline.run(query="Hello Haystack")
print(output["result"])
output
Processed query: Hello Haystack

Common variations

  • Implement run_batch() for batch processing.
  • Subclass specialized base classes like BaseRetriever or BaseGenerator for retrievers or generators.
  • Use Pipeline.add_node() to chain multiple custom components.
python
from haystack import Pipeline
from haystack.nodes import BaseRetriever

class CustomRetriever(BaseRetriever):
    def run(self, query, **kwargs):
        # Return dummy documents
        docs = [{"content": f"Doc for {query}", "id": "1"}]
        return {"documents": docs}, "output_1"

pipeline = Pipeline()
pipeline.add_node(component=CustomRetriever(), name="Retriever", inputs=["Query"])

output = pipeline.run(query="Find docs")
print(output["documents"])
output
[{'content': 'Doc for Find docs', 'id': '1'}]

Troubleshooting

  • If your component is not called, ensure you added it to the pipeline with correct inputs.
  • Check that run() returns a tuple of (dict, str) with output name.
  • For batch processing, verify run_batch() returns (dict, list_of_str) matching batch size.

Key Takeaways

  • Subclass Haystack's BaseComponent and implement run() to create custom components.
  • Use Pipeline.add_node() to integrate your custom component into Haystack pipelines.
  • Implement run_batch() for efficient batch processing support.
  • Specialized base classes like BaseRetriever simplify building retrievers or generators.
  • Always return a tuple of (dict, output_name) from run() to comply with Haystack's pipeline protocol.
Verified 2026-04
Verify ↗