How to intermediate · 3 min read

How to stream agent output in LangChain

Quick answer
Use LangChain's CallbackManager with a streaming callback handler like StreamingStdOutCallbackHandler to stream agent output in real time. Initialize your agent with streaming=True and pass the callback manager to receive incremental token outputs during generation.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install langchain openai

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install LangChain and OpenAI SDK: pip install langchain openai
  • Set your API key in your shell: export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] (Linux/macOS) or setx OPENAI_API_KEY "os.environ[\"OPENAI_API_KEY\"]" (Windows)
bash
pip install langchain openai

Step by step

This example demonstrates streaming output from a LangChain agent using the StreamingStdOutCallbackHandler. The agent streams tokens as they are generated, printing them to stdout in real time.

python
import os
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.agents import AgentType

# Define a simple tool for demonstration
def echo_tool(text: str) -> str:
    return f"Echo: {text}"

tools = [Tool(name="Echo", func=echo_tool, description="Echoes input text")]

# Initialize the chat model with streaming enabled
chat = ChatOpenAI(
    temperature=0,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    model_name="gpt-4o"
)

# Initialize the agent with the chat model and tools
agent = initialize_agent(
    tools,
    chat,
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Run the agent with streaming output
print("Agent output streaming:")
agent.run("Say hello and echo 'streaming test'.")
output
Agent output streaming:
Hello! Echo: streaming test.

Common variations

  • Async streaming: Use AsyncCallbackHandler and async agent methods for asynchronous streaming.
  • Custom callbacks: Implement your own callback handler subclassing BaseCallbackHandler to process tokens differently (e.g., update UI).
  • Different models: Swap model_name to gpt-4o-mini or other supported models for cost or speed tradeoffs.
  • Streaming with Anthropic or other providers: Use their respective streaming callback handlers and client initialization.

Troubleshooting

  • If streaming output does not appear, ensure streaming=True is set on the chat model.
  • Verify your callback handlers are correctly passed in the callbacks parameter.
  • Check your environment variable OPENAI_API_KEY is set and accessible.
  • For Windows users, restart your terminal after setting environment variables.

Key Takeaways

  • Enable streaming by setting streaming=True on your LangChain chat model.
  • Use StreamingStdOutCallbackHandler or custom callbacks to handle streamed tokens.
  • Pass callbacks to the chat model and initialize the agent with these callbacks for real-time output.
  • Async streaming requires async callbacks and async agent methods.
  • Always verify your API key and environment variables to avoid silent failures.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗