Code beginner · 3 min read

How to use Qwen API in Python

Direct answer

Use the official Qwen Python SDK by installing it, setting your API key in os.environ, importing the QwenClient, and calling client.chat.completions.create() with your prompt messages.

Setup

Install

bash

pip install qwen-sdk

Env vars

QWEN_API_KEY

Imports

python

import os
from qwen_sdk import QwenClient

Examples

inHello, who won the 2024 US presidential election?

outAs of 2026, the 2024 US presidential election was won by Joe Biden.

inWrite a Python function to reverse a string.

outdef reverse_string(s): return s[::-1]

inExplain quantum computing in simple terms.

outQuantum computing uses quantum bits that can be in multiple states at once, enabling faster problem solving for certain tasks.

Integration steps

Install the Qwen Python SDK with pip and set your API key in the environment variable QWEN_API_KEY.
Import the QwenClient class from the qwen_sdk package.
Initialize the client with your API key from os.environ.
Build the messages list with role and content for the chat completion.
Call the chat.completions.create() method with the model and messages.
Extract the AI response text from the returned object and use it in your application.

Full code

python

import os
from qwen_sdk import QwenClient

# Initialize client with API key from environment
client = QwenClient(api_key=os.environ["QWEN_API_KEY"])

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello, can you explain what Qwen AI is?"}
]

# Call the chat completion endpoint
response = client.chat.completions.create(
    model="qwen-v1",
    messages=messages
)

# Extract and print the response text
print("Qwen AI response:", response.choices[0].message.content)

API trace

Request

json

{"model": "qwen-v1", "messages": [{"role": "user", "content": "Hello, can you explain what Qwen AI is?"}]}

Response

json

{"choices": [{"message": {"content": "Qwen AI is a large language model developed by Alibaba..."}}], "usage": {"prompt_tokens": 15, "completion_tokens": 30, "total_tokens": 45}}

Extractresponse.choices[0].message.content

Variants

Streaming chat completion ›

Use streaming to display partial responses in real-time for better user experience with long outputs.

python

import os
from qwen_sdk import QwenClient

client = QwenClient(api_key=os.environ["QWEN_API_KEY"])

messages = [{"role": "user", "content": "Tell me a joke."}]

# Stream the response
for chunk in client.chat.completions.stream(
    model="qwen-v1",
    messages=messages
):
    print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print()

Async chat completion ›

Use async calls when integrating Qwen API in asynchronous Python applications to improve concurrency.

python

import os
import asyncio
from qwen_sdk import QwenClient

async def main():
    client = QwenClient(api_key=os.environ["QWEN_API_KEY"])
    messages = [{"role": "user", "content": "Summarize the latest AI trends."}]
    response = await client.chat.completions.acreate(
        model="qwen-v1",
        messages=messages
    )
    print("Async response:", response.choices[0].message.content)

asyncio.run(main())

Use smaller model for cost efficiency ›

Use the smaller model variant for faster responses and lower cost when high accuracy is not critical.

python

import os
from qwen_sdk import QwenClient

client = QwenClient(api_key=os.environ["QWEN_API_KEY"])
messages = [{"role": "user", "content": "Explain photosynthesis."}]
response = client.chat.completions.create(
    model="qwen-v1-small",
    messages=messages
)
print(response.choices[0].message.content)

Performance

Latency~700ms for qwen-v1 non-streaming calls

Cost~$0.0015 per 1000 tokens on qwen-v1

Rate limitsDefault tier: 600 requests per minute, 50,000 tokens per minute

Use concise prompts to reduce token usage.
Prefer smaller models for less critical tasks to save cost.
Cache frequent queries to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard chat completion	~700ms	~$0.0015/1k tokens	General purpose chat
Streaming chat completion	Starts within 300ms	~$0.0015/1k tokens	Real-time UI updates
Async chat completion	~700ms	~$0.0015/1k tokens	Concurrent async apps
Smaller model (qwen-v1-small)	~400ms	~$0.0008/1k tokens	Cost-sensitive or lightweight tasks

✓

Quick tip

Always set your QWEN_API_KEY in the environment and use the official QwenClient for stable, up-to-date API access.

⚠

Common mistake

Beginners often forget to include the 'role' field in messages or use incorrect model names, causing API errors.

Verified 2026-04 · qwen-v1, qwen-v1-small

Verify ↗