How to Intermediate · 3 min read

How to stream LangGraph agent output

Q: How to stream LangGraph agent output

To stream LangGraph agent output, use the stream=True parameter in the client.chat.completions.create method from the OpenAI SDK. This enables real-time token-by-token output, which you can process asynchronously or in a loop to display or handle partial responses as they arrive.

Quick answer

To stream LangGraph agent output, use the stream=True parameter in the client.chat.completions.create method from the OpenAI SDK. This enables real-time token-by-token output, which you can process asynchronously or in a loop to display or handle partial responses as they arrive.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
Basic familiarity with LangGraph agents and OpenAI SDK

Setup

Install the OpenAI Python SDK and set your API key as an environment variable. This example uses the gpt-4o model, which supports streaming.

bash

pip install openai>=1.0

# Set your API key in your shell environment:
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]

Step by step

This example demonstrates streaming output from a LangGraph agent by enabling stream=True in the OpenAI SDK call. The code prints tokens as they arrive, simulating real-time agent output.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "system", "content": "You are a LangGraph agent that explains concepts step-by-step."},
    {"role": "user", "content": "Explain how to stream LangGraph agent output."}
]

# Create a streaming chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

print("Streaming LangGraph agent output:")
for chunk in response:
    # Each chunk contains incremental tokens
    token = chunk.choices[0].delta.get("content", "")
    print(token, end="", flush=True)
print()

output

Streaming LangGraph agent output:
To stream LangGraph agent output, use the stream parameter in your API call to receive tokens as they are generated, allowing real-time processing.

Common variations

Use async with async for to handle streaming asynchronously.
Switch models like gpt-4o-mini or claude-3-5-sonnet-20241022 if supported.
Integrate streaming with LangGraph's agent orchestration to handle partial outputs for dynamic decision-making.

python

import asyncio
import os
from openai import OpenAI

async def stream_langgraph_agent():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "system", "content": "You are a LangGraph agent."},
        {"role": "user", "content": "Stream output asynchronously."}
    ]

    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    print("Async streaming output:")
    async for chunk in response:
        token = chunk.choices[0].delta.get("content", "")
        print(token, end="", flush=True)
    print()

asyncio.run(stream_langgraph_agent())

output

Async streaming output:
Streaming output asynchronously allows your LangGraph agent to process tokens as they arrive in real time.

Troubleshooting

If streaming does not start, verify your model supports streaming (e.g., gpt-4o does, some smaller models may not).
Check your API key and environment variable setup if authentication errors occur.
For partial or empty tokens, ensure you are accessing chunk.choices[0].delta["content"] correctly.

✅

Key Takeaways

Enable streaming by setting stream=True in client.chat.completions.create.
Process streaming tokens incrementally from the response iterator for real-time output.
Use async streaming for non-blocking LangGraph agent integration.
Verify model compatibility with streaming before implementation.
Handle token chunks carefully to avoid missing partial content.

Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗