How to use GPT-4o mini in python
model="gpt-4o-mini" and call client.chat.completions.create() passing your messages array to interact with GPT-4o mini.Setup
pip install openai OPENAI_API_KEY import os
from openai import OpenAI Examples
Integration steps
- Install the OpenAI Python SDK and set your OPENAI_API_KEY environment variable.
- Import the OpenAI client and initialize it with your API key from os.environ.
- Create a messages list with user role and content to send to the model.
- Call client.chat.completions.create() with model="gpt-4o-mini" and the messages array.
- Extract the response text from response.choices[0].message.content.
- Use or display the generated text as needed.
Full code
import os
from openai import OpenAI
# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Prepare messages for the chat completion
messages = [
{"role": "user", "content": "Hello, how are you?"}
]
# Call the GPT-4o mini model
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
# Extract and print the assistant's reply
print("Assistant:", response.choices[0].message.content) Assistant: I'm doing great, thank you! How can I assist you today?
API trace
{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello, how are you?"}]} {"choices": [{"message": {"content": "I'm doing great, thank you! How can I assist you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}} response.choices[0].message.contentVariants
Streaming response ›
Use streaming to display partial responses in real-time for better user experience with long outputs.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Tell me a joke."}]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.get('content', ''), end='')
print() Async version ›
Use async calls when integrating into asynchronous Python applications or frameworks.
import os
import asyncio
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Explain recursion."}]
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=messages
)
print("Assistant:", response.choices[0].message.content)
asyncio.run(main()) Alternative model: gpt-4o ›
Use gpt-4o for higher quality and more detailed responses when latency and cost are less critical.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest AI trends."}]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print("Assistant:", response.choices[0].message.content) Performance
- Keep messages concise to reduce prompt tokens.
- Use system prompts sparingly to save tokens.
- Reuse conversation context efficiently to avoid resending large histories.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call | ~600ms | ~$0.0008 | Simple queries and responses |
| Streaming | Starts immediately, total ~600ms | ~$0.0008 | Long outputs with better UX |
| Async call | ~600ms | ~$0.0008 | Concurrent or async Python apps |
Quick tip
Always specify <code>model="gpt-4o-mini"</code> explicitly and pass messages as a list of role-content dicts for correct chat completions.
Common mistake
Using deprecated SDK methods like <code>openai.ChatCompletion.create()</code> instead of the current <code>client.chat.completions.create()</code> pattern.