Testing custom runnables in isolation
Why this matters
Custom runnables often contain business logic that must be tested in isolation before integration. Without proper testing patterns, you can't verify that your runnable behaves correctly under different input conditions, with custom dependencies injected, or when streaming is involved: leading to integration surprises.
Explanation
Custom runnables in LangChain are reusable, composable units of logic that transform input to output. They inherit from Runnable and can be chained with other runnables using the | operator. Testing them in isolation means verifying the runnable's behavior independently of any downstream runnables or external services it might call.
Mechanically, isolation testing works by: (1) Creating an instance of your custom runnable with test-friendly configuration; (2) Invoking it with controlled input using invoke() or stream(); (3) Asserting on the output without relying on LLM responses or database calls. You inject mock dependencies into the runnable's constructor and use RunnableConfig to pass runtime overrides like tags, metadata, or custom callables that get invoked during execution.
When to use this: Test your custom runnable before composing it into larger chains. Test error handling, input validation, and state transformation. Test different execution modes (sync vs. async). Once isolated tests pass, compose it into integration tests with real dependencies.
Analogy
Think of it like testing a single microservice in isolation with dependency injection and mock databases, before deploying it to production and connecting it to real services. You want to prove the microservice logic is correct before you trust it in the mesh.
Code
import json
from typing import Any
from langchain_core.runnables import Runnable, RunnableConfig
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_core.outputs import Generation, LLMResult
class CustomTransformRunnable(Runnable[dict, dict]):
"""A custom runnable that validates input and transforms it."""
def __init__(self, required_keys: list[str], multiplier: int = 1):
super().__init__()
self.required_keys = required_keys
self.multiplier = multiplier
def invoke(
self,
input: dict,
config: RunnableConfig | None = None,
) -> dict:
"""Validate required keys and transform the input."""
missing = [k for k in self.required_keys if k not in input]
if missing:
raise ValueError(f"Missing required keys: {missing}")
config = config or RunnableConfig()
metadata = config.get("metadata", {})
result = {
"original": input.copy(),
"transformed": {k: v * self.multiplier for k, v in input.items()},
"metadata": metadata,
}
return result
async def ainvoke(
self,
input: dict,
config: RunnableConfig | None = None,
) -> dict:
"""Async version of invoke."""
return self.invoke(input, config)
def _stream(
self,
input: dict,
config: RunnableConfig | None = None,
**kwargs: Any,
):
"""Yield streaming output chunks."""
config = config or RunnableConfig()
missing = [k for k in self.required_keys if k not in input]
if missing:
raise ValueError(f"Missing required keys: {missing}")
for key, value in input.items():
transformed_value = value * self.multiplier
yield {"key": key, "value": transformed_value}
def test_custom_runnable_basic_invoke():
"""Test that the runnable correctly transforms valid input."""
runnable = CustomTransformRunnable(
required_keys=["count", "score"],
multiplier=2,
)
result = runnable.invoke({"count": 5, "score": 10})
assert result["original"] == {"count": 5, "score": 10}
assert result["transformed"] == {"count": 10, "score": 20}
print("✓ Basic invoke test passed")
def test_custom_runnable_missing_keys():
"""Test that the runnable raises ValueError for missing required keys."""
runnable = CustomTransformRunnable(
required_keys=["count", "score"],
multiplier=2,
)
try:
runnable.invoke({"count": 5})
assert False, "Should have raised ValueError"
except ValueError as e:
assert "Missing required keys" in str(e)
assert "score" in str(e)
print(f"✓ Missing keys validation test passed: {e}")
def test_custom_runnable_with_config():
"""Test that RunnableConfig metadata is passed through."""
runnable = CustomTransformRunnable(
required_keys=["x"],
multiplier=3,
)
config = RunnableConfig(
metadata={"user_id": "user_123", "request_id": "req_456"},
)
result = runnable.invoke({"x": 7}, config=config)
assert result["metadata"] == {"user_id": "user_123", "request_id": "req_456"}
assert result["transformed"]["x"] == 21
print("✓ RunnableConfig metadata test passed")
def test_custom_runnable_streaming():
"""Test that streaming output yields correct chunks."""
runnable = CustomTransformRunnable(
required_keys=["a", "b"],
multiplier=2,
)
chunks = list(runnable.stream({"a": 3, "b": 5}))
assert len(chunks) == 2
assert {"key": "a", "value": 6} in chunks
assert {"key": "b", "value": 10} in chunks
print(f"✓ Streaming test passed. Chunks: {chunks}")
def test_custom_runnable_default_multiplier():
"""Test that default multiplier of 1 works."""
runnable = CustomTransformRunnable(required_keys=["num"])
result = runnable.invoke({"num": 42})
assert result["transformed"]["num"] == 42
print("✓ Default multiplier test passed")
if __name__ == "__main__":
test_custom_runnable_basic_invoke()
test_custom_runnable_missing_keys()
test_custom_runnable_with_config()
test_custom_runnable_streaming()
test_custom_runnable_default_multiplier()
print("\n✓ All tests passed") ✓ Basic invoke test passed
✓ Missing keys validation test passed: Missing required keys: ['score']
✓ RunnableConfig metadata test passed
✓ Streaming test passed. Chunks: [{'key': 'a', 'value': 6}, {'key': 'b', 'value': 10}]
✓ Default multiplier test passed
✓ All tests passed What just happened?
The code defined a custom Runnable subclass that validates required input keys and multiplies numeric values. Five test functions then instantiated this runnable with different configurations and invoked it with controlled inputs, validating that: (1) normal transformations work; (2) missing keys raise ValueError with the correct message; (3) RunnableConfig metadata is threaded through without loss; (4) streaming produces expected chunks; and (5) default constructor arguments work. All tests passed, proving the runnable's core logic is correct in isolation.
Common gotcha
Developers often forget that RunnableConfig is a dict-like object accessed via config.get(), not attribute access like config.metadata. If you write config['metadata'] and config is None, you'll get a TypeError. Always use config = config or RunnableConfig() at the start of your invoke method, or use the .get() method with a default value. Also, custom runnables must implement both invoke() and ainvoke() if you plan to use them in async chains: if you only implement sync, async composition will silently fall back to running it in a thread pool, which can cause subtle race conditions.
Error recovery
TypeError: 'NoneType' object is not subscriptableNotImplementedError: _stream not implementedAssertionError: Should have raised ValueErrorExperienced dev note
The real power of isolation testing custom runnables is that you can catch logic bugs before they compose into larger chains where they're hard to debug. Write tests for your custom runnable's invoke/stream/ainvoke methods with different input shapes and config states. One pattern senior teams use: create a test fixture factory that builds runnable instances with different configurations, so you can quickly test both happy paths and error cases. Also, RunnableConfig is often overlooked: use it to thread request-scoped data (user ID, request ID, tracing context) through your runnable without polluting the input dict. When you compose custom runnables into chains, this metadata flows through automatically, making debugging and observability much easier.
Check your understanding
If your custom runnable's validation logic should raise an error when a required key is missing, but the error should only be raised when streaming (not when invoking synchronously), how would you implement this differently, and what testing challenge does this create?
Show answer hint
A correct answer would explain that you'd move the validation into _stream() but remove it from invoke(), or add a config flag to toggle validation mode. The testing challenge is that you'd now need two separate test paths: one that calls invoke() and expects success, and one that calls stream() and expects an error: to prove both code paths work correctly.