How to stream reasoning model output
Quick answer
Use the
stream parameter in the chat completion API call to receive partial outputs from reasoning models like deepseek-reasoner in real time. This enables your application to process and display the model's reasoning step-by-step as it generates the response.PREREQUISITES
Python 3.8+DeepSeek API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for authentication.
pip install openai>=1.0 Step by step
This example demonstrates streaming output from the deepseek-reasoner model using the OpenAI-compatible Python SDK. The stream=True parameter enables partial response chunks to be received as they are generated.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=[{"role": "user", "content": "Explain the reasoning behind Fermat's Last Theorem."}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.get('content', ''), end='', flush=True)
print() output
The reasoning behind Fermat's Last Theorem involves advanced number theory and algebraic geometry, culminating in Andrew Wiles' proof using elliptic curves and modular forms.
Common variations
- Use
asyncstreaming withasync forloops in asynchronous Python environments. - Switch models by changing the
modelparameter, e.g.,gpt-4ofor general chat ordeepseek-reasonerfor reasoning. - Adjust
max_tokensandtemperatureto control output length and creativity.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])
async def stream_reasoning():
response = await client.chat.completions.acreate(
model="deepseek-reasoner",
messages=[{"role": "user", "content": "Explain the Pythagorean theorem."}],
stream=True
)
async for chunk in response:
print(chunk.choices[0].delta.get('content', ''), end='', flush=True)
print()
asyncio.run(stream_reasoning()) output
The Pythagorean theorem states that in a right triangle, the square of the hypotenuse equals the sum of the squares of the other two sides.
Troubleshooting
- If streaming does not start, verify your API key and model name are correct.
- Check network connectivity if partial chunks never arrive.
- Ensure your environment supports asynchronous code if using
asyncstreaming. - For large outputs, monitor token limits to avoid truncation.
Key Takeaways
- Enable streaming by setting
stream=Truein your chat completion request. - Process partial outputs incrementally to display reasoning steps in real time.
- Use asynchronous streaming for non-blocking applications.
- Choose the reasoning model
deepseek-reasonerfor complex logical tasks. - Always verify API keys and model names to avoid connection issues.