How to intermediate · 3 min read

How to stream agent output in LangChain

Q: How to stream agent output in LangChain

Use LangChain's CallbackManager with a streaming callback handler like StreamingStdOutCallbackHandler to stream agent output in real time. Initialize your agent with streaming=True and pass the callback manager to receive incremental token outputs during generation.

Quick answer

Use LangChain's CallbackManager with a streaming callback handler like StreamingStdOutCallbackHandler to stream agent output in real time. Initialize your agent with streaming=True and pass the callback manager to receive incremental token outputs during generation.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install langchain openai

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install LangChain and OpenAI SDK: pip install langchain openai
Set your API key in your shell: export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] (Linux/macOS) or setx OPENAI_API_KEY "os.environ[\"OPENAI_API_KEY\"]" (Windows)

bash

pip install langchain openai

Step by step

This example demonstrates streaming output from a LangChain agent using the StreamingStdOutCallbackHandler. The agent streams tokens as they are generated, printing them to stdout in real time.

python

import os
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.agents import AgentType

# Define a simple tool for demonstration
def echo_tool(text: str) -> str:
    return f"Echo: {text}"

tools = [Tool(name="Echo", func=echo_tool, description="Echoes input text")]

# Initialize the chat model with streaming enabled
chat = ChatOpenAI(
    temperature=0,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    model_name="gpt-4o"
)

# Initialize the agent with the chat model and tools
agent = initialize_agent(
    tools,
    chat,
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Run the agent with streaming output
print("Agent output streaming:")
agent.run("Say hello and echo 'streaming test'.")

output

Agent output streaming:
Hello! Echo: streaming test.

Common variations

Async streaming: Use AsyncCallbackHandler and async agent methods for asynchronous streaming.
Custom callbacks: Implement your own callback handler subclassing BaseCallbackHandler to process tokens differently (e.g., update UI).
Different models: Swap model_name to gpt-4o-mini or other supported models for cost or speed tradeoffs.
Streaming with Anthropic or other providers: Use their respective streaming callback handlers and client initialization.

Troubleshooting

If streaming output does not appear, ensure streaming=True is set on the chat model.
Verify your callback handlers are correctly passed in the callbacks parameter.
Check your environment variable OPENAI_API_KEY is set and accessible.
For Windows users, restart your terminal after setting environment variables.

✅

Key Takeaways

Enable streaming by setting streaming=True on your LangChain chat model.
Use StreamingStdOutCallbackHandler or custom callbacks to handle streamed tokens.
Pass callbacks to the chat model and initialize the agent with these callbacks for real-time output.
Async streaming requires async callbacks and async agent methods.
Always verify your API key and environment variables to avoid silent failures.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗