Code beginner · 3 min read

How to use GPT-4o mini in python

Q: How to use GPT-4o mini in python

Use the OpenAI Python SDK v1 with model="gpt-4o-mini" and call client.chat.completions.create() passing your messages array to interact with GPT-4o mini.

Direct answer

Use the OpenAI Python SDK v1 with model="gpt-4o-mini" and call client.chat.completions.create() passing your messages array to interact with GPT-4o mini.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from openai import OpenAI

Examples

inHello, how are you?

outI'm doing great, thank you! How can I assist you today?

inWrite a Python function to reverse a string.

outdef reverse_string(s): return s[::-1]

inExplain quantum computing in simple terms.

outQuantum computing uses quantum bits that can be both 0 and 1 at the same time, enabling faster problem solving for certain tasks.

Integration steps

Install the OpenAI Python SDK and set your OPENAI_API_KEY environment variable.
Import the OpenAI client and initialize it with your API key from os.environ.
Create a messages list with user role and content to send to the model.
Call client.chat.completions.create() with model="gpt-4o-mini" and the messages array.
Extract the response text from response.choices[0].message.content.
Use or display the generated text as needed.

Full code

python

import os
from openai import OpenAI

# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Prepare messages for the chat completion
messages = [
    {"role": "user", "content": "Hello, how are you?"}
]

# Call the GPT-4o mini model
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

# Extract and print the assistant's reply
print("Assistant:", response.choices[0].message.content)

output

Assistant: I'm doing great, thank you! How can I assist you today?

API trace

Request

json

{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello, how are you?"}]}

Response

json

{"choices": [{"message": {"content": "I'm doing great, thank you! How can I assist you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}}

Extractresponse.choices[0].message.content

Variants

Streaming response ›

Use streaming to display partial responses in real-time for better user experience with long outputs.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Tell me a joke."}]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='')
print()

Async version ›

Use async calls when integrating into asynchronous Python applications or frameworks.

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain recursion."}]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages
    )
    print("Assistant:", response.choices[0].message.content)

asyncio.run(main())

Alternative model: gpt-4o ›

Use gpt-4o for higher quality and more detailed responses when latency and cost are less critical.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest AI trends."}]
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
print("Assistant:", response.choices[0].message.content)

Performance

Latency~600ms for gpt-4o-mini non-streaming calls

Cost~$0.0008 per 500 tokens

Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute

Keep messages concise to reduce prompt tokens.
Use system prompts sparingly to save tokens.
Reuse conversation context efficiently to avoid resending large histories.

Approach	Latency	Cost/call	Best for
Standard call	~600ms	~$0.0008	Simple queries and responses
Streaming	Starts immediately, total ~600ms	~$0.0008	Long outputs with better UX
Async call	~600ms	~$0.0008	Concurrent or async Python apps

✓

Quick tip

Always specify <code>model="gpt-4o-mini"</code> explicitly and pass messages as a list of role-content dicts for correct chat completions.

⚠

Common mistake

Using deprecated SDK methods like <code>openai.ChatCompletion.create()</code> instead of the current <code>client.chat.completions.create()</code> pattern.

Verified 2026-04 · gpt-4o-mini, gpt-4o

Verify ↗