How to beginner · 3 min read

How to stream Qwen API responses

Q: How to stream Qwen API responses

To stream Qwen API responses, use the OpenAI SDK with the stream=True parameter in chat.completions.create(). This enables real-time token-by-token output from the model, allowing you to process or display partial results as they arrive.

Quick answer

To stream Qwen API responses, use the OpenAI SDK with the stream=True parameter in chat.completions.create(). This enables real-time token-by-token output from the model, allowing you to process or display partial results as they arrive.

PREREQUISITES

Python 3.8+
OpenAI API key with Qwen access
pip install openai>=1.0

Setup

Install the openai Python package version 1.0 or higher and set your API key as an environment variable.

Install package: pip install openai>=1.0
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai>=1.0

Step by step

This example demonstrates streaming Qwen chat completions using the OpenAI SDK. The stream=True parameter enables receiving tokens as they are generated.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "user", "content": "Explain the benefits of streaming API responses."}
]

response_stream = client.chat.completions.create(
    model="qwen-v1",
    messages=messages,
    stream=True
)

print("Streaming response:")
for chunk in response_stream:
    # Each chunk contains partial message content
    token = chunk.choices[0].delta.get("content", "")
    print(token, end="", flush=True)
print()

output

Streaming response:
Explain the benefits of streaming API responses. Streaming allows you to receive tokens in real time, reducing latency and improving user experience.

Common variations

Async streaming: Use async for with an async client to handle streaming asynchronously.
Different models: Replace model="qwen-v1" with other Qwen variants if available.
Non-streaming: Omit stream=True to get the full response at once.

python

import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Tell me a joke."}]

    response_stream = await client.chat.completions.acreate(
        model="qwen-v1",
        messages=messages,
        stream=True
    )

    print("Async streaming response:")
    async for chunk in response_stream:
        token = chunk.choices[0].delta.get("content", "")
        print(token, end="", flush=True)
    print()

asyncio.run(async_stream())

output

Async streaming response:
Why did the scarecrow win an award? Because he was outstanding in his field!

Troubleshooting

If streaming does not start, verify your API key and model name are correct.
Ensure your network connection supports persistent HTTP connections.
For partial or empty tokens, check that you are accessing chunk.choices[0].delta["content"] safely.

Key Takeaways

Use stream=True in chat.completions.create() to enable streaming with Qwen.
Process tokens incrementally from chunk.choices[0].delta["content"] for real-time output.
Async streaming is supported via acreate() and async for iteration.
Always set your API key in os.environ["OPENAI_API_KEY"] for secure authentication.
Verify model names and network connectivity if streaming fails or stalls.

Verified 2026-04 · qwen-v1

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.