How to Intermediate · 3 min read

How to stream LangChain chain output

Q: How to stream LangChain chain output

Use LangChain's ChatOpenAI with streaming=True and implement a callback handler like StreamingStdOutCallbackHandler to stream chain output in real time. Pass the streaming client to your chain and run it to receive incremental tokens as they generate.

Quick answer

Use LangChain's ChatOpenAI with streaming=True and implement a callback handler like StreamingStdOutCallbackHandler to stream chain output in real time. Pass the streaming client to your chain and run it to receive incremental tokens as they generate.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install langchain_openai>=0.2 openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install LangChain OpenAI integration and OpenAI SDK:

bash

pip install langchain_openai openai

Step by step

This example demonstrates streaming LangChain chain output using ChatOpenAI with streaming=True and the built-in StreamingStdOutCallbackHandler. The chain will print tokens as they arrive.

python

import os
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Ensure your OpenAI API key is set in environment
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]

# Create a streaming ChatOpenAI client
llm = ChatOpenAI(
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    model="gpt-4o",
    temperature=0.7,
    openai_api_key=os.environ["OPENAI_API_KEY"]
)

# Define a simple prompt template
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}.")

# Create the chain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain with streaming output
chain.run({"topic": "computers"})

output

Why did the computer show up at work late? Because it had a hard drive!

Common variations

You can customize streaming behavior by implementing your own callback handler instead of StreamingStdOutCallbackHandler. Also, streaming works with other LangChain chains like ConversationChain. For async streaming, use ChatOpenAI with async methods and async callbacks.

python

import asyncio
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks.base import AsyncCallbackHandler

class AsyncPrintHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs):
        print(token, end='', flush=True)

async def main():
    llm = ChatOpenAI(
        streaming=True,
        callbacks=[AsyncPrintHandler()],
        model="gpt-4o",
        temperature=0.7,
        openai_api_key=os.environ["OPENAI_API_KEY"]
    )
    prompt = ChatPromptTemplate.from_template("Explain {topic} in simple terms.")
    chain = LLMChain(llm=llm, prompt=prompt)
    await chain.arun({"topic": "quantum computing"})

asyncio.run(main())

output

Quantum computing is a type of computing that uses quantum bits, or qubits, which can be in multiple states at once...

Troubleshooting

If streaming output does not appear, ensure streaming=True is set on ChatOpenAI and callbacks are provided.
Check your API key is correctly set in os.environ["OPENAI_API_KEY"].
For slow or no output, verify network connectivity and model availability.

✅

Key Takeaways

Enable streaming by setting streaming=True on ChatOpenAI and provide a callback handler.
Use StreamingStdOutCallbackHandler for simple console streaming or implement custom handlers for advanced use cases.
Streaming works with both synchronous and asynchronous LangChain chains.
Always set your OpenAI API key in os.environ to avoid authentication errors.

Verified 2026-04 · gpt-4o

Verify ↗