How to stream LangChain chain output
Quick answer
Use
ChatOpenAI from langchain_openai with streaming=True to enable streaming output. Then, iterate over the stream generator to process tokens in real time as the chain runs.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install langchain_openai openai>=1.0
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install LangChain OpenAI integration and OpenAI SDK:
pip install langchain_openai openai>=1.0 output
Collecting langchain_openai Collecting openai Successfully installed langchain_openai openai
Step by step
This example shows how to create a ChatOpenAI instance with streaming enabled and run a simple LangChain chain that streams output tokens as they arrive.
import os
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
# Set your OpenAI API key in environment variable OPENAI_API_KEY
# Initialize ChatOpenAI with streaming enabled
chat = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
temperature=0.7,
openai_api_key=os.environ["OPENAI_API_KEY"]
)
# Define a simple prompt template
prompt = ChatPromptTemplate.from_messages([
HumanMessagePromptTemplate.from_template("Tell me a joke about {topic}.")
])
# Create the chain
chain = LLMChain(llm=chat, prompt=prompt)
# Run the chain with streaming output
print("Streaming output:")
for token in chain.stream({"topic": "computers"}):
print(token, end='', flush=True)
print() output
Streaming output: Why did the computer go to the doctor? Because it had a virus!
Common variations
You can use asynchronous streaming with async for if your environment supports it. Also, you can switch models by changing the model parameter in ChatOpenAI. For example, use gpt-4o for higher quality or gpt-4o-mini for faster responses.
import asyncio
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
async def async_stream():
chat = ChatOpenAI(
model="gpt-4o",
streaming=True,
temperature=0.7,
openai_api_key=os.environ["OPENAI_API_KEY"]
)
prompt = ChatPromptTemplate.from_messages([
HumanMessagePromptTemplate.from_template("Explain quantum computing in simple terms.")
])
chain = LLMChain(llm=chat, prompt=prompt)
print("Async streaming output:")
async for token in chain.stream({}):
print(token, end='', flush=True)
print()
if __name__ == "__main__":
asyncio.run(async_stream()) output
Async streaming output: Quantum computing uses quantum bits, or qubits, which can be in multiple states at once, allowing computers to solve certain problems much faster than classical computers.
Troubleshooting
- If streaming does not start, ensure
streaming=Trueis set inChatOpenAI. - If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - For slow or no output, check your internet connection and API usage limits.
Key Takeaways
- Enable streaming in LangChain by setting
streaming=TrueinChatOpenAI. - Use
chain.stream()to iterate tokens as they are generated for real-time output. - Async streaming is supported with
async forin async environments. - Always set your API key via environment variables for secure and reliable authentication.