How to Intermediate · 3 min read

How to stream LangChain agent output

Q: How to stream LangChain agent output

To stream LangChain agent output, enable the streaming parameter on the underlying LLM client and handle the streamed tokens via callbacks or async iterators. Use LangChain's CallbackManager or AsyncCallbackManager to capture and process output chunks as they arrive.

Quick answer

To stream LangChain agent output, enable the streaming parameter on the underlying LLM client and handle the streamed tokens via callbacks or async iterators. Use LangChain's CallbackManager or AsyncCallbackManager to capture and process output chunks as they arrive.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install langchain openai>=1.0

Setup

Install the required packages and set your environment variable for the OpenAI API key.

Install LangChain and OpenAI SDK: pip install langchain openai
Set your API key in the environment: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install langchain openai

Step by step

This example shows how to create a LangChain agent with streaming enabled using the OpenAI gpt-4o model. The CallbackManager captures streamed tokens and prints them in real-time.

python

import os
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain.callbacks.base import BaseCallbackHandler

class StreamHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end='', flush=True)

# Initialize the streaming LLM client
llm = ChatOpenAI(
    model_name="gpt-4o",
    streaming=True
)

# Attach the streaming callback handler
stream_handler = StreamHandler()
llm.callback_manager.add_handler(stream_handler)

# Initialize a simple agent with the streaming LLM
agent = initialize_agent(
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    llm=llm,
    verbose=True
)

# Run the agent with streaming output
print("Agent output:")
agent.run("Write a short poem about AI streaming output.")

output

Agent output:
AI streams in flowing code,
Tokens dance, ideas explode.
Real-time thoughts on screen displayed,
LangChain guides the streaming parade.

Common variations

You can also use async streaming with LangChain by using AsyncCallbackManager and async LLM clients. Different models like gpt-4o-mini or Anthropic's claude-3-5-sonnet-20241022 support streaming similarly. Adjust the callback handlers to process tokens differently, e.g., buffering or sending to a UI.

python

import os
import asyncio
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import AsyncCallbackHandler

class AsyncStreamHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end='', flush=True)

async def main():
    llm = ChatOpenAI(
        model_name="gpt-4o-mini",
        streaming=True
    )
    stream_handler = AsyncStreamHandler()
    llm.callback_manager.add_handler(stream_handler)

    response = await llm.acall("Explain streaming in LangChain.")
    print("\nFull response:", response)

asyncio.run(main())

output

Explain streaming in LangChain means tokens arrive incrementally, allowing real-time output display.
Full response: Explain streaming in LangChain means tokens arrive incrementally, allowing real-time output display.

Troubleshooting

If streaming output does not appear, ensure streaming=True is set on the LLM client.
Check that your callback handlers are properly attached to the callback_manager.
For async streaming, confirm you are running inside an async event loop.
If output is buffered and delayed, verify your print or UI update calls flush output immediately.

✅

Key Takeaways

Enable streaming on the LangChain LLM client with streaming=True to get incremental output.
Use CallbackManager or AsyncCallbackManager with custom handlers to process streamed tokens.
Streaming works with multiple models and can be integrated into agents for real-time interaction.
Flush output immediately in handlers to avoid buffering delays in the console or UI.
Async streaming requires running inside an async event loop and using async callback handlers.

Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗