How to stream Qwen API responses
Quick answer
To stream
Qwen API responses, use the OpenAI SDK with the stream=True parameter in chat.completions.create(). This enables real-time token-by-token output from the model, allowing you to process or display partial results as they arrive.PREREQUISITES
Python 3.8+OpenAI API key with Qwen accesspip install openai>=1.0
Setup
Install the openai Python package version 1.0 or higher and set your API key as an environment variable.
- Install package:
pip install openai>=1.0 - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai>=1.0 Step by step
This example demonstrates streaming Qwen chat completions using the OpenAI SDK. The stream=True parameter enables receiving tokens as they are generated.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [
{"role": "user", "content": "Explain the benefits of streaming API responses."}
]
response_stream = client.chat.completions.create(
model="qwen-v1",
messages=messages,
stream=True
)
print("Streaming response:")
for chunk in response_stream:
# Each chunk contains partial message content
token = chunk.choices[0].delta.get("content", "")
print(token, end="", flush=True)
print() output
Streaming response: Explain the benefits of streaming API responses. Streaming allows you to receive tokens in real time, reducing latency and improving user experience.
Common variations
- Async streaming: Use
async forwith an async client to handle streaming asynchronously. - Different models: Replace
model="qwen-v1"with other Qwen variants if available. - Non-streaming: Omit
stream=Trueto get the full response at once.
import asyncio
import os
from openai import OpenAI
async def async_stream():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Tell me a joke."}]
response_stream = await client.chat.completions.acreate(
model="qwen-v1",
messages=messages,
stream=True
)
print("Async streaming response:")
async for chunk in response_stream:
token = chunk.choices[0].delta.get("content", "")
print(token, end="", flush=True)
print()
asyncio.run(async_stream()) output
Async streaming response: Why did the scarecrow win an award? Because he was outstanding in his field!
Troubleshooting
- If streaming does not start, verify your API key and model name are correct.
- Ensure your network connection supports persistent HTTP connections.
- For partial or empty tokens, check that you are accessing
chunk.choices[0].delta["content"]safely.
Key Takeaways
- Use
stream=Trueinchat.completions.create()to enable streaming with Qwen. - Process tokens incrementally from
chunk.choices[0].delta["content"]for real-time output. - Async streaming is supported via
acreate()andasync foriteration. - Always set your API key in
os.environ["OPENAI_API_KEY"]for secure authentication. - Verify model names and network connectivity if streaming fails or stalls.