How to stream LangChain chain output
Quick answer
Use LangChain's
ChatOpenAI with streaming=True and implement a callback handler like StreamingStdOutCallbackHandler to stream chain output in real time. Pass the streaming client to your chain and run it to receive incremental tokens as they generate.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install langchain_openai>=0.2 openai>=1.0
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install LangChain OpenAI integration and OpenAI SDK:
pip install langchain_openai openai Step by step
This example demonstrates streaming LangChain chain output using ChatOpenAI with streaming=True and the built-in StreamingStdOutCallbackHandler. The chain will print tokens as they arrive.
import os
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# Ensure your OpenAI API key is set in environment
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]
# Create a streaming ChatOpenAI client
llm = ChatOpenAI(
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
model="gpt-4o",
temperature=0.7,
openai_api_key=os.environ["OPENAI_API_KEY"]
)
# Define a simple prompt template
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}.")
# Create the chain
chain = LLMChain(llm=llm, prompt=prompt)
# Run the chain with streaming output
chain.run({"topic": "computers"}) output
Why did the computer show up at work late? Because it had a hard drive!
Common variations
You can customize streaming behavior by implementing your own callback handler instead of StreamingStdOutCallbackHandler. Also, streaming works with other LangChain chains like ConversationChain. For async streaming, use ChatOpenAI with async methods and async callbacks.
import asyncio
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks.base import AsyncCallbackHandler
class AsyncPrintHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs):
print(token, end='', flush=True)
async def main():
llm = ChatOpenAI(
streaming=True,
callbacks=[AsyncPrintHandler()],
model="gpt-4o",
temperature=0.7,
openai_api_key=os.environ["OPENAI_API_KEY"]
)
prompt = ChatPromptTemplate.from_template("Explain {topic} in simple terms.")
chain = LLMChain(llm=llm, prompt=prompt)
await chain.arun({"topic": "quantum computing"})
asyncio.run(main()) output
Quantum computing is a type of computing that uses quantum bits, or qubits, which can be in multiple states at once...
Troubleshooting
- If streaming output does not appear, ensure
streaming=Trueis set onChatOpenAIand callbacks are provided. - Check your API key is correctly set in
os.environ["OPENAI_API_KEY"]. - For slow or no output, verify network connectivity and model availability.
Key Takeaways
- Enable streaming by setting
streaming=TrueonChatOpenAIand provide a callback handler. - Use
StreamingStdOutCallbackHandlerfor simple console streaming or implement custom handlers for advanced use cases. - Streaming works with both synchronous and asynchronous LangChain chains.
- Always set your OpenAI API key in
os.environto avoid authentication errors.