API Beginner easy · 4 min

Models available: GPT-4o, GPT-4o-mini, o1, o3

What you will learn
The OpenAI Python SDK lets you choose between GPT-4o (fastest), GPT-4o-mini (cheapest), o1 (reasoning), and o3 (advanced reasoning) depending on your task complexity and budget.

Why this matters

You must pick the right model before making any API call. Choosing GPT-4o when o1 is needed wastes money on wrong answers; choosing o3 when GPT-4o-mini suffices burns budget unnecessarily. Model selection is the first decision in every OpenAI application.

Skip if: Use a local model (Ollama, LLaMA 2) when you need offline capability, zero API latency, or have a fixed compute budget. Use Claude if you need 200k context windows or prefer Anthropic's safety approach. Use Gemini if you're already in the Google Cloud ecosystem.

Explanation

What this does: The OpenAI Python SDK (version 1.x) supports four model families. You specify the model name as a string in the model parameter when creating chat completions, embeddings, or other API calls. Each model has different speed, cost, reasoning capability, and context window.

How it works: When you call client.chat.completions.create(model='gpt-4o', ...), the SDK routes your request to OpenAI's infrastructure. The model name determines which weights, serving infrastructure, and pricing tier you use. GPT-4o (omni) handles vision and text together. GPT-4o-mini is a smaller, faster version. o1 and o3 are reasoning models that solve harder problems but cost more and are slower. The model parameter is required: there is no default.

When to use each: Use gpt-4o for general tasks, summarization, and creative work. Use gpt-4o-mini for simple classification, fast turnarounds, and tight budgets. Use o1 for math, coding, and logic problems that need deliberate reasoning. Use o3 for the hardest problems: planning, research synthesis, multi-step reasoning: when cost is secondary to correctness.

Request code

python
import os
from openai import OpenAI

api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
    raise ValueError('OPENAI_API_KEY environment variable not set')

client = OpenAI(api_key=api_key)

models_to_test = ['gpt-4o', 'gpt-4o-mini', 'o1', 'o3']

for model_name in models_to_test:
    try:
        response = client.chat.completions.create(
            model=model_name,
            messages=[
                {'role': 'user', 'content': 'Explain quantum entanglement in one sentence.'}
            ],
            max_tokens=100
        )
        print(f'Model: {model_name}')
        print(f'Response: {response.choices[0].message.content}')
        print(f'Tokens used: {response.usage.total_tokens}')
        print('---')
    except Exception as e:
        print(f'Model {model_name} failed: {e}')

Authentication

Set your API key as an environment variable before running code: export OPENAI_API_KEY='sk-...' Or pass it directly to the client: from openai import OpenAI client = OpenAI(api_key='sk-...') The SDK reads OPENAI_API_KEY at instantiation time, so set the env var before importing or before calling OpenAI().

Response shape

FieldDescription
choices [object Object]
usage [object Object]
model the exact model name that processed the request

Field guide

choices[0].message.content

The actual text the model generated: this is what you display to the user or pass to downstream logic

usage.total_tokens

Critical for billing: OpenAI charges per token. Track this per request to forecast spend

finish_reason

Tells you why the response ended. 'stop' = normal completion. 'length' = hit max_tokens limit before finishing (your response is truncated)

Setup trap

The SDK instantiates at OpenAI() call time and reads OPENAI_API_KEY immediately. If you set os.environ['OPENAI_API_KEY'] = '...' after calling OpenAI(), the client is already initialized and will use the old (or missing) key. Set the environment variable or pass api_key before any OpenAI() instantiation. Tests that mock api_key inside a test function after the module imports the client will fail for this reason.

Cost

As of April 2026: GPT-4o costs ~$5–15 per million tokens (varies by input/output). GPT-4o-mini costs ~$0.15 per million input tokens. o1 and o3 cost 10–50× more. A single o3 call can cost $0.20–2.00 depending on reasoning effort. For production, always enable token usage tracking and set spending alerts in your OpenAI account dashboard. A bug that sends 1000 o3 requests by mistake costs thousands.

Rate limits

o1 and o3 have stricter rate limits than GPT-4o (typically 10 requests/minute on free tier vs 3500 requests/minute for GPT-4o). If you batch-test models, start with GPT-4o and only call o1/o3 for final validation. Use exponential backoff for retries: OpenAI returns 429 (Too Many Requests) when you exceed limits.

Common gotcha

Using a model name that doesn't exist (typo, or old name like 'gpt-4-turbo') fails silently in your code logic but immediately fails at API call time with a 404. The error message will say 'The model `...` does not exist.' Always verify model names in OpenAI's current model list: they change quarterly as new models release and old ones deprecate.

Error recovery

AuthenticationError
Your API key is invalid, expired, or missing. Check: `export OPENAI_API_KEY='sk-...'` is set correctly before running, or passed to OpenAI(api_key='...'). Regenerate your key in the OpenAI dashboard if compromised.
RateLimitError
You've exceeded request limits. Add exponential backoff: import time; time.sleep(2 ** attempt) before retry. For o1/o3, reduce request rate or upgrade account tier.
NotFoundError
Model name doesn't exist. Double-check spelling: 'gpt-4o' not 'gpt-4-omni'. Visit OpenAI docs for current model names: 'gpt-4-turbo' is deprecated.
APIConnectionError
Network or DNS failure. Verify internet connection. If persistent, check OpenAI status page for outages.

Experienced dev note

Always start with GPT-4o-mini in development and testing, then graduate to GPT-4o for quality, and only use o1/o3 for irreducible hard problems (because they are 50-100x more expensive). Many teams burn budget running o3 for tasks that GPT-4o solves perfectly. Also: o1 and o3 ignore the system role in some versions: they generate their own thinking process. If your prompts mysteriously stop working with o1, check OpenAI's release notes and consider using o1-preview or GPT-4o instead. Lastly, model availability varies by region and API tier; a model that works for you might not work for a customer in another region or on a lower tier. Always have a fallback model ready.

Check your understanding

You're building a real-time chatbot for customer support. You prototype with o3 because it reasons carefully. But in production, your costs spike 200x. Why did this happen, and which model should you use instead? What would you track to catch this before it hits production?

Show answer hint

o3 is 50-100x more expensive than GPT-4o and was only meant for hard reasoning tasks, not general chat. For production chat, use GPT-4o. Set up cost alerts and test with a small traffic sample before scaling.

VERSION openai 1.x SDK (current as of April 2026). Model names and pricing valid as of April 2026: these change quarterly. o1 and o3 are reasoning models introduced in late 2024 and early 2025. If you upgrade the SDK, check OpenAI's migration guide; there are no breaking changes in 1.x but new models are added constantly.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.