Code beginner · 3 min read

How to use Qwen API in Python

Direct answer
Use the official Qwen Python SDK by installing it, setting your API key in os.environ, importing the QwenClient, and calling client.chat.completions.create() with your prompt messages.

Setup

Install
bash
pip install qwen-sdk
Env vars
QWEN_API_KEY
Imports
python
import os
from qwen_sdk import QwenClient

Examples

inHello, who won the 2024 US presidential election?
outAs of 2026, the 2024 US presidential election was won by Joe Biden.
inWrite a Python function to reverse a string.
outdef reverse_string(s): return s[::-1]
inExplain quantum computing in simple terms.
outQuantum computing uses quantum bits that can be in multiple states at once, enabling faster problem solving for certain tasks.

Integration steps

  1. Install the Qwen Python SDK with pip and set your API key in the environment variable QWEN_API_KEY.
  2. Import the QwenClient class from the qwen_sdk package.
  3. Initialize the client with your API key from os.environ.
  4. Build the messages list with role and content for the chat completion.
  5. Call the chat.completions.create() method with the model and messages.
  6. Extract the AI response text from the returned object and use it in your application.

Full code

python
import os
from qwen_sdk import QwenClient

# Initialize client with API key from environment
client = QwenClient(api_key=os.environ["QWEN_API_KEY"])

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello, can you explain what Qwen AI is?"}
]

# Call the chat completion endpoint
response = client.chat.completions.create(
    model="qwen-v1",
    messages=messages
)

# Extract and print the response text
print("Qwen AI response:", response.choices[0].message.content)

API trace

Request
json
{"model": "qwen-v1", "messages": [{"role": "user", "content": "Hello, can you explain what Qwen AI is?"}]}
Response
json
{"choices": [{"message": {"content": "Qwen AI is a large language model developed by Alibaba..."}}], "usage": {"prompt_tokens": 15, "completion_tokens": 30, "total_tokens": 45}}
Extractresponse.choices[0].message.content

Variants

Streaming chat completion

Use streaming to display partial responses in real-time for better user experience with long outputs.

python
import os
from qwen_sdk import QwenClient

client = QwenClient(api_key=os.environ["QWEN_API_KEY"])

messages = [{"role": "user", "content": "Tell me a joke."}]

# Stream the response
for chunk in client.chat.completions.stream(
    model="qwen-v1",
    messages=messages
):
    print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print()
Async chat completion

Use async calls when integrating Qwen API in asynchronous Python applications to improve concurrency.

python
import os
import asyncio
from qwen_sdk import QwenClient

async def main():
    client = QwenClient(api_key=os.environ["QWEN_API_KEY"])
    messages = [{"role": "user", "content": "Summarize the latest AI trends."}]
    response = await client.chat.completions.acreate(
        model="qwen-v1",
        messages=messages
    )
    print("Async response:", response.choices[0].message.content)

asyncio.run(main())
Use smaller model for cost efficiency

Use the smaller model variant for faster responses and lower cost when high accuracy is not critical.

python
import os
from qwen_sdk import QwenClient

client = QwenClient(api_key=os.environ["QWEN_API_KEY"])
messages = [{"role": "user", "content": "Explain photosynthesis."}]
response = client.chat.completions.create(
    model="qwen-v1-small",
    messages=messages
)
print(response.choices[0].message.content)

Performance

Latency~700ms for qwen-v1 non-streaming calls
Cost~$0.0015 per 1000 tokens on qwen-v1
Rate limitsDefault tier: 600 requests per minute, 50,000 tokens per minute
  • Use concise prompts to reduce token usage.
  • Prefer smaller models for less critical tasks to save cost.
  • Cache frequent queries to avoid repeated calls.
ApproachLatencyCost/callBest for
Standard chat completion~700ms~$0.0015/1k tokensGeneral purpose chat
Streaming chat completionStarts within 300ms~$0.0015/1k tokensReal-time UI updates
Async chat completion~700ms~$0.0015/1k tokensConcurrent async apps
Smaller model (qwen-v1-small)~400ms~$0.0008/1k tokensCost-sensitive or lightweight tasks

Quick tip

Always set your QWEN_API_KEY in the environment and use the official QwenClient for stable, up-to-date API access.

Common mistake

Beginners often forget to include the 'role' field in messages or use incorrect model names, causing API errors.

Verified 2026-04 · qwen-v1, qwen-v1-small
Verify ↗