How to beginner · 3 min read

LangChain streaming callbacks

Quick answer
Use LangChain's CallbackManager and StreamingStdOutCallbackHandler to handle streaming responses in real time. Instantiate your chat model with streaming=True and pass the callback manager to receive tokens as they arrive.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install langchain_openai>=0.2 openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install LangChain OpenAI bindings and OpenAI SDK:
bash
pip install langchain_openai openai
output
Collecting langchain_openai
Collecting openai
Successfully installed langchain_openai openai

Step by step

This example shows how to use LangChain's streaming callbacks to print tokens as they stream from the OpenAI gpt-4o model.

python
import os
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager

# Ensure your API key is set in the environment
# export OPENAI_API_KEY="your_api_key"

# Create a callback manager with the streaming stdout handler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Instantiate the chat model with streaming enabled and callback manager
chat = ChatOpenAI(
    model="gpt-4o",
    streaming=True,
    callback_manager=callback_manager,
    temperature=0.7
)

# Run a chat completion with streaming output
response = chat.invoke([{"role": "user", "content": "Explain streaming callbacks in LangChain."}])

print("\nFinal response:", response.content)
output
Explain streaming callbacks in LangChain.
They allow you to receive tokens as they are generated, enabling real-time output.
Final response: They allow you to receive tokens as they are generated, enabling real-time output.

Common variations

You can create custom callback handlers by subclassing BaseCallbackHandler to process tokens differently, such as logging or updating a UI. Async streaming is supported by using async LangChain models and async callbacks. You can also switch models by changing the model parameter.

python
from langchain.callbacks.base import BaseCallbackHandler

class CustomPrintCallbackHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Token received: {token}", end="", flush=True)

callback_manager = CallbackManager([CustomPrintCallbackHandler()])

chat = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    callback_manager=callback_manager
)

response = chat.invoke([{"role": "user", "content": "Say hello with streaming."}])
print("\nResponse content:", response.content)
output
Token received: HToken received: eToken received: lToken received: lToken received: o
Response content: Hello

Troubleshooting

  • If streaming output does not appear, ensure streaming=True and a callback manager with streaming handlers is passed.
  • Check your OpenAI API key is correctly set in os.environ["OPENAI_API_KEY"].
  • For async usage, confirm you are using async-compatible LangChain models and await calls properly.

Key Takeaways

  • Use LangChain's CallbackManager with streaming handlers to receive tokens in real time.
  • Enable streaming by setting streaming=True when instantiating ChatOpenAI.
  • Custom callback handlers allow flexible processing of streamed tokens for logging or UI updates.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗