How to beginner · 3 min read

LangChain streaming callbacks

Q: LangChain streaming callbacks

Use LangChain's CallbackManager and StreamingStdOutCallbackHandler to handle streaming responses in real time. Instantiate your chat model with streaming=True and pass the callback manager to receive tokens as they arrive.

Quick answer

Use LangChain's CallbackManager and StreamingStdOutCallbackHandler to handle streaming responses in real time. Instantiate your chat model with streaming=True and pass the callback manager to receive tokens as they arrive.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install langchain_openai>=0.2 openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install LangChain OpenAI bindings and OpenAI SDK:

bash

pip install langchain_openai openai

output

Collecting langchain_openai
Collecting openai
Successfully installed langchain_openai openai

Step by step

This example shows how to use LangChain's streaming callbacks to print tokens as they stream from the OpenAI gpt-4o model.

python

import os
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager

# Ensure your API key is set in the environment
# export OPENAI_API_KEY="your_api_key"

# Create a callback manager with the streaming stdout handler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Instantiate the chat model with streaming enabled and callback manager
chat = ChatOpenAI(
    model="gpt-4o",
    streaming=True,
    callback_manager=callback_manager,
    temperature=0.7
)

# Run a chat completion with streaming output
response = chat.invoke([{"role": "user", "content": "Explain streaming callbacks in LangChain."}])

print("\nFinal response:", response.content)

output

Explain streaming callbacks in LangChain.
They allow you to receive tokens as they are generated, enabling real-time output.
Final response: They allow you to receive tokens as they are generated, enabling real-time output.

Common variations

You can create custom callback handlers by subclassing BaseCallbackHandler to process tokens differently, such as logging or updating a UI. Async streaming is supported by using async LangChain models and async callbacks. You can also switch models by changing the model parameter.

python

from langchain.callbacks.base import BaseCallbackHandler

class CustomPrintCallbackHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Token received: {token}", end="", flush=True)

callback_manager = CallbackManager([CustomPrintCallbackHandler()])

chat = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    callback_manager=callback_manager
)

response = chat.invoke([{"role": "user", "content": "Say hello with streaming."}])
print("\nResponse content:", response.content)

output

Token received: HToken received: eToken received: lToken received: lToken received: o
Response content: Hello

Troubleshooting

If streaming output does not appear, ensure streaming=True and a callback manager with streaming handlers is passed.
Check your OpenAI API key is correctly set in os.environ["OPENAI_API_KEY"].
For async usage, confirm you are using async-compatible LangChain models and await calls properly.

✅

Key Takeaways

Use LangChain's CallbackManager with streaming handlers to receive tokens in real time.
Enable streaming by setting streaming=True when instantiating ChatOpenAI.
Custom callback handlers allow flexible processing of streamed tokens for logging or UI updates.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗