Code Advanced hard · 8 min

Testing custom runnables in isolation

What you will learn

Use RunnableConfig and synchronous test harnesses to unit-test custom Runnable implementations without mocking external dependencies.

Why this matters

Custom runnables often contain business logic that must be tested in isolation before integration. Without proper testing patterns, you can't verify that your runnable behaves correctly under different input conditions, with custom dependencies injected, or when streaming is involved: leading to integration surprises.

Skip if: Don't use isolation testing when you're validating end-to-end behavior with real LLM calls, real databases, or real vector stores. That belongs in integration tests. Isolation testing is for verifying the runnable's transformation logic, error handling, and configuration behavior: not for proving the entire chain works with production services.

Explanation

Custom runnables in LangChain are reusable, composable units of logic that transform input to output. They inherit from Runnable and can be chained with other runnables using the | operator. Testing them in isolation means verifying the runnable's behavior independently of any downstream runnables or external services it might call.

Mechanically, isolation testing works by: (1) Creating an instance of your custom runnable with test-friendly configuration; (2) Invoking it with controlled input using invoke() or stream(); (3) Asserting on the output without relying on LLM responses or database calls. You inject mock dependencies into the runnable's constructor and use RunnableConfig to pass runtime overrides like tags, metadata, or custom callables that get invoked during execution.

When to use this: Test your custom runnable before composing it into larger chains. Test error handling, input validation, and state transformation. Test different execution modes (sync vs. async). Once isolated tests pass, compose it into integration tests with real dependencies.

Analogy

Think of it like testing a single microservice in isolation with dependency injection and mock databases, before deploying it to production and connecting it to real services. You want to prove the microservice logic is correct before you trust it in the mesh.

Code

python

import json
from typing import Any
from langchain_core.runnables import Runnable, RunnableConfig
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_core.outputs import Generation, LLMResult


class CustomTransformRunnable(Runnable[dict, dict]):
    """A custom runnable that validates input and transforms it."""

    def __init__(self, required_keys: list[str], multiplier: int = 1):
        super().__init__()
        self.required_keys = required_keys
        self.multiplier = multiplier

    def invoke(
        self,
        input: dict,
        config: RunnableConfig | None = None,
    ) -> dict:
        """Validate required keys and transform the input."""
        missing = [k for k in self.required_keys if k not in input]
        if missing:
            raise ValueError(f"Missing required keys: {missing}")

        config = config or RunnableConfig()
        metadata = config.get("metadata", {})

        result = {
            "original": input.copy(),
            "transformed": {k: v * self.multiplier for k, v in input.items()},
            "metadata": metadata,
        }
        return result

    async def ainvoke(
        self,
        input: dict,
        config: RunnableConfig | None = None,
    ) -> dict:
        """Async version of invoke."""
        return self.invoke(input, config)

    def _stream(
        self,
        input: dict,
        config: RunnableConfig | None = None,
        **kwargs: Any,
    ):
        """Yield streaming output chunks."""
        config = config or RunnableConfig()
        missing = [k for k in self.required_keys if k not in input]
        if missing:
            raise ValueError(f"Missing required keys: {missing}")

        for key, value in input.items():
            transformed_value = value * self.multiplier
            yield {"key": key, "value": transformed_value}


def test_custom_runnable_basic_invoke():
    """Test that the runnable correctly transforms valid input."""
    runnable = CustomTransformRunnable(
        required_keys=["count", "score"],
        multiplier=2,
    )

    result = runnable.invoke({"count": 5, "score": 10})

    assert result["original"] == {"count": 5, "score": 10}
    assert result["transformed"] == {"count": 10, "score": 20}
    print("✓ Basic invoke test passed")


def test_custom_runnable_missing_keys():
    """Test that the runnable raises ValueError for missing required keys."""
    runnable = CustomTransformRunnable(
        required_keys=["count", "score"],
        multiplier=2,
    )

    try:
        runnable.invoke({"count": 5})
        assert False, "Should have raised ValueError"
    except ValueError as e:
        assert "Missing required keys" in str(e)
        assert "score" in str(e)
        print(f"✓ Missing keys validation test passed: {e}")


def test_custom_runnable_with_config():
    """Test that RunnableConfig metadata is passed through."""
    runnable = CustomTransformRunnable(
        required_keys=["x"],
        multiplier=3,
    )

    config = RunnableConfig(
        metadata={"user_id": "user_123", "request_id": "req_456"},
    )

    result = runnable.invoke({"x": 7}, config=config)

    assert result["metadata"] == {"user_id": "user_123", "request_id": "req_456"}
    assert result["transformed"]["x"] == 21
    print("✓ RunnableConfig metadata test passed")


def test_custom_runnable_streaming():
    """Test that streaming output yields correct chunks."""
    runnable = CustomTransformRunnable(
        required_keys=["a", "b"],
        multiplier=2,
    )

    chunks = list(runnable.stream({"a": 3, "b": 5}))

    assert len(chunks) == 2
    assert {"key": "a", "value": 6} in chunks
    assert {"key": "b", "value": 10} in chunks
    print(f"✓ Streaming test passed. Chunks: {chunks}")


def test_custom_runnable_default_multiplier():
    """Test that default multiplier of 1 works."""
    runnable = CustomTransformRunnable(required_keys=["num"])

    result = runnable.invoke({"num": 42})

    assert result["transformed"]["num"] == 42
    print("✓ Default multiplier test passed")


if __name__ == "__main__":
    test_custom_runnable_basic_invoke()
    test_custom_runnable_missing_keys()
    test_custom_runnable_with_config()
    test_custom_runnable_streaming()
    test_custom_runnable_default_multiplier()
    print("\n✓ All tests passed")

Output

✓ Basic invoke test passed
✓ Missing keys validation test passed: Missing required keys: ['score']
✓ RunnableConfig metadata test passed
✓ Streaming test passed. Chunks: [{'key': 'a', 'value': 6}, {'key': 'b', 'value': 10}]
✓ Default multiplier test passed

✓ All tests passed

What just happened?

The code defined a custom Runnable subclass that validates required input keys and multiplies numeric values. Five test functions then instantiated this runnable with different configurations and invoked it with controlled inputs, validating that: (1) normal transformations work; (2) missing keys raise ValueError with the correct message; (3) RunnableConfig metadata is threaded through without loss; (4) streaming produces expected chunks; and (5) default constructor arguments work. All tests passed, proving the runnable's core logic is correct in isolation.

Common gotcha

Developers often forget that RunnableConfig is a dict-like object accessed via config.get(), not attribute access like config.metadata. If you write config['metadata'] and config is None, you'll get a TypeError. Always use config = config or RunnableConfig() at the start of your invoke method, or use the .get() method with a default value. Also, custom runnables must implement both invoke() and ainvoke() if you plan to use them in async chains: if you only implement sync, async composition will silently fall back to running it in a thread pool, which can cause subtle race conditions.

Error recovery

TypeError: 'NoneType' object is not subscriptable

You tried to access config['key'] when config is None. Fix: Always check if config is None before indexing it. Use `config = config or RunnableConfig()` at the start of invoke(), or use `config.get('key', default_value)` if RunnableConfig supports it.

NotImplementedError: _stream not implemented

You forgot to implement the _stream() method in your custom Runnable subclass when you want to support streaming. Fix: Add a _stream() method that yields chunks. Even if your runnable doesn't naturally stream, implement _stream() to at least yield the final result once.

AssertionError: Should have raised ValueError

Your validation logic didn't actually raise an exception when it should have. This usually means your validation condition is backwards or missing entirely. Fix: Double-check the if statement and make sure you're actually raising ValueError (not returning an error dict) before the transformation logic runs.

Experienced dev note

The real power of isolation testing custom runnables is that you can catch logic bugs before they compose into larger chains where they're hard to debug. Write tests for your custom runnable's invoke/stream/ainvoke methods with different input shapes and config states. One pattern senior teams use: create a test fixture factory that builds runnable instances with different configurations, so you can quickly test both happy paths and error cases. Also, RunnableConfig is often overlooked: use it to thread request-scoped data (user ID, request ID, tracing context) through your runnable without polluting the input dict. When you compose custom runnables into chains, this metadata flows through automatically, making debugging and observability much easier.

Check your understanding

If your custom runnable's validation logic should raise an error when a required key is missing, but the error should only be raised when streaming (not when invoking synchronously), how would you implement this differently, and what testing challenge does this create?

Show answer hint

A correct answer would explain that you'd move the validation into _stream() but remove it from invoke(), or add a config flag to toggle validation mode. The testing challenge is that you'd now need two separate test paths: one that calls invoke() and expects success, and one that calls stream() and expects an error: to prove both code paths work correctly.

VERSION LangChain 1.2.x (langchain-core 0.3.x) requires subclasses of Runnable to implement either _stream or stream. In langchain < 1.0.0, the Runnable base class was less stable and didn't have RunnableConfig. If upgrading from < 1.0.0, rewrite your custom runnables to inherit from Runnable and implement invoke/ainvoke/_stream explicitly.

Next, learn how to compose your tested custom runnables with built-in runnables using LCEL (Expressible Composable Language), where the <code>|</code> operator chains runnables and preserves type safety across composition boundaries.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.