High severity beginner · Fix: 2-5 min

NotSupportedError

openai.NotSupportedError: Streaming is not supported with this model (o1, o3)

What this error means
OpenAI's o1 and o3 reasoning models do not support streaming responses: attempting stream=True throws NotSupportedError in the SDK.

Stack trace

traceback
openai.NotSupportedError: Streaming is not supported with this model (o1, o3).

  File "<your-script>.py", line 42, in get_reasoning_response
    response = client.chat.completions.create(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/site-packages/openai/resources/chat/completions.py", line 187, in create
    return self._post(
           ^^^^^^^^^^^
  File "/site-packages/openai/base_client.py", line 1289, in post
    return self._request(
           ^^^^^^^^^^^^
  File "/site-packages/openai/base_client.py", line 988, in _request
    raise err.error
openai.NotSupportedError: Streaming is not supported with this model (o1, o3). Only non-streaming requests are supported.
QUICK FIX
Remove stream=True from your client.chat.completions.create() call targeting o1/o3 models: reasoning models do not support streaming responses.

Why it happens

OpenAI's reasoning models (o1, o3) use extended thinking internally, which requires completing the entire reasoning process before returning a response. Streaming responses would require sending partial reasoning state, which breaks the reasoning model's ability to explore solution space cohesively. The OpenAI SDK v1.3.0+ enforces this limitation at the client level by raising NotSupportedError when stream=True is passed with o1/o3 models.

Detection

Check for stream=True in any client.chat.completions.create() call targeting o1, o3-mini, or o3 models. Use grep or IDE search to audit existing code: grep -r 'stream=True' your_codebase.py | grep -E '(o1|o3)'.

Causes & fixes

1

Passing stream=True when calling client.chat.completions.create() with model='o1' or 'o3-mini'

✓ Fix

Remove stream=True entirely or set stream=False (the default). Non-streaming is required for reasoning models.

2

Migrating code from gpt-4o/gpt-4o-mini (which support streaming) to o1/o3 without updating stream parameter

✓ Fix

Create a conditional: if model.startswith('o1') or model.startswith('o3'): use stream=False; else: use stream=True if needed. Or use a model compatibility matrix.

3

Using streaming wrappers or libraries (like LangChain's StreamingCallbackHandler) with o1/o3 models

✓ Fix

Disable streaming for o1/o3 in your abstraction layer. Create a model-aware fallback that buffers the full response instead of streaming.

4

Attempting to use astream() async method with o1/o3 models

✓ Fix

Switch to the standard create() method (not astream() or stream()) for reasoning models. Use asyncio.to_thread() if you need async compatibility without streaming.

Code: broken vs fixed

Broken - triggers the error
python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

# BROKEN: stream=True is not supported with o1
response = client.chat.completions.create(
    model='o1',
    messages=[{'role': 'user', 'content': 'Solve this math problem: 2^100'}],
    stream=True,  # ❌ This causes NotSupportedError
    max_completion_tokens=4000
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='', flush=True)
Fixed - works correctly
python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

# FIXED: Removed stream=True for o1 (reasoning models do not support streaming)
response = client.chat.completions.create(
    model='o1',
    messages=[{'role': 'user', 'content': 'Solve this math problem: 2^100'}],
    # stream=True is removed — o1 only supports non-streaming responses
    max_completion_tokens=4000
)

# Full response is returned immediately
print('Reasoning:')
print(response.choices[0].message.content)
print(f'\nInput tokens: {response.usage.prompt_tokens}')
print(f'Output tokens: {response.usage.completion_tokens}')
Removed stream=True parameter since o1/o3 reasoning models require full response buffering to complete their extended thinking process. The response is returned as a complete message object, not a stream of chunks.

Workaround

If you have a UI expecting streaming behavior, buffer the full o1 response and yield it in chunks on the client side. Store the complete response from client.chat.completions.create(model='o1', stream=False, ...), then split response.choices[0].message.content by sentence or word and send to client in timed intervals (e.g., asyncio.sleep(0.1) between chunks). This gives the appearance of streaming while respecting o1's non-streaming requirement.

Prevention

Build a model-aware abstraction layer that checks model type before setting streaming. Create an enum or config table mapping models to their streaming capability: SUPPORTS_STREAMING = {'gpt-4o', 'gpt-4o-mini', 'gpt-3.5-turbo', ...}; REASONING_ONLY = {'o1', 'o3', 'o3-mini'}. Use this in your request builder: if model in REASONING_ONLY: stream=False. Document this limitation in your API layer or RAG system so developers know o1/o3 are not suitable for real-time streaming responses.

Python 3.9+ · openai >=1.3.0 · tested on 1.56.x (2026-04)
Verified 2026-04 · o1, o3-mini, o3, gpt-4o, gpt-4o-mini
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.