How to Intermediate · 3 min read

How to stream LangGraph agent output

Quick answer
To stream LangGraph agent output, use the stream=True parameter in the client.chat.completions.create method from the OpenAI SDK. This enables real-time token-by-token output, which you can process asynchronously or in a loop to display or handle partial responses as they arrive.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • Basic familiarity with LangGraph agents and OpenAI SDK

Setup

Install the OpenAI Python SDK and set your API key as an environment variable. This example uses the gpt-4o model, which supports streaming.

bash
pip install openai>=1.0

# Set your API key in your shell environment:
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]

Step by step

This example demonstrates streaming output from a LangGraph agent by enabling stream=True in the OpenAI SDK call. The code prints tokens as they arrive, simulating real-time agent output.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "system", "content": "You are a LangGraph agent that explains concepts step-by-step."},
    {"role": "user", "content": "Explain how to stream LangGraph agent output."}
]

# Create a streaming chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

print("Streaming LangGraph agent output:")
for chunk in response:
    # Each chunk contains incremental tokens
    token = chunk.choices[0].delta.get("content", "")
    print(token, end="", flush=True)
print()
output
Streaming LangGraph agent output:
To stream LangGraph agent output, use the stream parameter in your API call to receive tokens as they are generated, allowing real-time processing.

Common variations

  • Use async with async for to handle streaming asynchronously.
  • Switch models like gpt-4o-mini or claude-3-5-sonnet-20241022 if supported.
  • Integrate streaming with LangGraph's agent orchestration to handle partial outputs for dynamic decision-making.
python
import asyncio
import os
from openai import OpenAI

async def stream_langgraph_agent():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "system", "content": "You are a LangGraph agent."},
        {"role": "user", "content": "Stream output asynchronously."}
    ]

    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    print("Async streaming output:")
    async for chunk in response:
        token = chunk.choices[0].delta.get("content", "")
        print(token, end="", flush=True)
    print()

asyncio.run(stream_langgraph_agent())
output
Async streaming output:
Streaming output asynchronously allows your LangGraph agent to process tokens as they arrive in real time.

Troubleshooting

  • If streaming does not start, verify your model supports streaming (e.g., gpt-4o does, some smaller models may not).
  • Check your API key and environment variable setup if authentication errors occur.
  • For partial or empty tokens, ensure you are accessing chunk.choices[0].delta["content"] correctly.

Key Takeaways

  • Enable streaming by setting stream=True in client.chat.completions.create.
  • Process streaming tokens incrementally from the response iterator for real-time output.
  • Use async streaming for non-blocking LangGraph agent integration.
  • Verify model compatibility with streaming before implementation.
  • Handle token chunks carefully to avoid missing partial content.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗