How to beginner · 3 min read

How to stream LangChain chain output

Q: How to stream LangChain chain output

Use ChatOpenAI from langchain_openai with streaming=True to enable streaming output. Then, iterate over the stream generator to process tokens in real time as the chain runs.

Quick answer

Use ChatOpenAI from langchain_openai with streaming=True to enable streaming output. Then, iterate over the stream generator to process tokens in real time as the chain runs.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install langchain_openai openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install LangChain OpenAI integration and OpenAI SDK:

bash

pip install langchain_openai openai>=1.0

output

Collecting langchain_openai
Collecting openai
Successfully installed langchain_openai openai

Step by step

This example shows how to create a ChatOpenAI instance with streaming enabled and run a simple LangChain chain that streams output tokens as they arrive.

python

import os
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

# Set your OpenAI API key in environment variable OPENAI_API_KEY

# Initialize ChatOpenAI with streaming enabled
chat = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    temperature=0.7,
    openai_api_key=os.environ["OPENAI_API_KEY"]
)

# Define a simple prompt template
prompt = ChatPromptTemplate.from_messages([
    HumanMessagePromptTemplate.from_template("Tell me a joke about {topic}.")
])

# Create the chain
chain = LLMChain(llm=chat, prompt=prompt)

# Run the chain with streaming output
print("Streaming output:")
for token in chain.stream({"topic": "computers"}):
    print(token, end='', flush=True)
print()

output

Streaming output:
Why did the computer go to the doctor? Because it had a virus!

Common variations

You can use asynchronous streaming with async for if your environment supports it. Also, you can switch models by changing the model parameter in ChatOpenAI. For example, use gpt-4o for higher quality or gpt-4o-mini for faster responses.

python

import asyncio
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

async def async_stream():
    chat = ChatOpenAI(
        model="gpt-4o",
        streaming=True,
        temperature=0.7,
        openai_api_key=os.environ["OPENAI_API_KEY"]
    )

    prompt = ChatPromptTemplate.from_messages([
        HumanMessagePromptTemplate.from_template("Explain quantum computing in simple terms.")
    ])

    chain = LLMChain(llm=chat, prompt=prompt)

    print("Async streaming output:")
    async for token in chain.stream({}):
        print(token, end='', flush=True)
    print()

if __name__ == "__main__":
    asyncio.run(async_stream())

output

Async streaming output:
Quantum computing uses quantum bits, or qubits, which can be in multiple states at once, allowing computers to solve certain problems much faster than classical computers.

Troubleshooting

If streaming does not start, ensure streaming=True is set in ChatOpenAI.
If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
For slow or no output, check your internet connection and API usage limits.

✅

Key Takeaways

Enable streaming in LangChain by setting streaming=True in ChatOpenAI.
Use chain.stream() to iterate tokens as they are generated for real-time output.
Async streaming is supported with async for in async environments.
Always set your API key via environment variables for secure and reliable authentication.

Verified 2026-04 · gpt-4o-mini, gpt-4o

Verify ↗