How to Intermediate · 3 min read

How to stream reasoning model output

Q: How to stream reasoning model output

Use the stream parameter in the chat completion API call to receive partial outputs from reasoning models like deepseek-reasoner in real time. This enables your application to process and display the model's reasoning step-by-step as it generates the response.

Quick answer

Use the stream parameter in the chat completion API call to receive partial outputs from reasoning models like deepseek-reasoner in real time. This enables your application to process and display the model's reasoning step-by-step as it generates the response.

PREREQUISITES

Python 3.8+
DeepSeek API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for authentication.

bash

pip install openai>=1.0

Step by step

This example demonstrates streaming output from the deepseek-reasoner model using the OpenAI-compatible Python SDK. The stream=True parameter enables partial response chunks to be received as they are generated.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[{"role": "user", "content": "Explain the reasoning behind Fermat's Last Theorem."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='', flush=True)
print()

output

The reasoning behind Fermat's Last Theorem involves advanced number theory and algebraic geometry, culminating in Andrew Wiles' proof using elliptic curves and modular forms.

Common variations

Use async streaming with async for loops in asynchronous Python environments.
Switch models by changing the model parameter, e.g., gpt-4o for general chat or deepseek-reasoner for reasoning.
Adjust max_tokens and temperature to control output length and creativity.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])

async def stream_reasoning():
    response = await client.chat.completions.acreate(
        model="deepseek-reasoner",
        messages=[{"role": "user", "content": "Explain the Pythagorean theorem."}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.get('content', ''), end='', flush=True)
    print()

asyncio.run(stream_reasoning())

output

The Pythagorean theorem states that in a right triangle, the square of the hypotenuse equals the sum of the squares of the other two sides.

Troubleshooting

If streaming does not start, verify your API key and model name are correct.
Check network connectivity if partial chunks never arrive.
Ensure your environment supports asynchronous code if using async streaming.
For large outputs, monitor token limits to avoid truncation.

✅

Key Takeaways

Enable streaming by setting stream=True in your chat completion request.
Process partial outputs incrementally to display reasoning steps in real time.
Use asynchronous streaming for non-blocking applications.
Choose the reasoning model deepseek-reasoner for complex logical tasks.
Always verify API keys and model names to avoid connection issues.

Verified 2026-04 · deepseek-reasoner, gpt-4o

Verify ↗