API Beginner easy · 5 min

client.chat.completions.create(): the core method

What you will learn

Send a conversation to GPT and get back a structured response using OpenAI's primary chat completion endpoint.

Why this matters

This is the fundamental building block for every LLM interaction in production. Understanding request structure, response format, and parameter tuning directly impacts latency, cost, and output quality across all downstream applications.

Skip if: Use streaming (<code>stream=True</code>) instead when you need real-time token arrival for UI updates. Use batch processing (Batch API) when you have 10,000+ non-urgent requests to process cost-effectively. Use embeddings endpoint if you need vector representations, not text generation.

Explanation

What it does: client.chat.completions.create() sends a list of messages to OpenAI's GPT model and returns a single completion response. It's the standard synchronous way to interact with chat models like gpt-4o and gpt-4-turbo.

How it works: You provide a model name, a list of message objects (with roles like 'system', 'user', 'assistant'), and optional parameters like temperature and max_tokens. The SDK constructs an HTTP POST request, sends it to api.openai.com, waits for the response, and returns a ChatCompletion object containing the model's reply in .choices[0].message.content.

When to use it: Use this for single-turn or multi-turn conversations where you can wait for the full response. It's ideal for chatbots, Q&A systems, content generation, and analysis tasks where latency under 5 seconds is acceptable.

Request code

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

response = client.chat.completions.create(
    model='gpt-4o',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'Explain quantum entanglement in one sentence.'}
    ],
    temperature=0.7,
    max_tokens=100
)

print(response.choices[0].message.content)

Authentication

Set your API key as an environment variable before running code: export OPENAI_API_KEY='sk-...' Or pass it directly to the client: client = OpenAI(api_key='sk-...') The SDK reads OPENAI_API_KEY automatically if no api_key is passed to OpenAI().

Response shape

Field	Description
`id`	String identifier for this completion (e.g., 'chatcmpl-8nB...')
`object`	Always 'chat.completion'
`created`	Unix timestamp when response was generated
`model`	Model name that processed the request
`choices`	List of completion objects
`choices[0].message.content`	The text response from the model
`choices[0].message.role`	Always 'assistant'
`choices[0].finish_reason`	Why generation stopped: 'stop' (natural), 'length' (hit max_tokens), or 'tool_calls'
`usage.prompt_tokens`	Tokens consumed by your input messages
`usage.completion_tokens`	Tokens generated in the response
`usage.total_tokens`	Sum of prompt and completion tokens

Field guide

choices[0].message.content

The actual text you need to display or process: this is where your answer lives

usage.total_tokens

Multiply by the model's per-token cost ($0.03 per 1M input tokens for gpt-4o as of April 2026) to understand what this request cost you

finish_reason

If it says 'length', your response was truncated: increase max_tokens or reduce prompt size

Setup trap

Setting os.environ['OPENAI_API_KEY'] after instantiating OpenAI() does NOT work. The SDK reads the environment variable at initialization time. Always set your environment variable or pass api_key to OpenAI() before making any API calls.

Cost

Each call costs based on input and output tokens. gpt-4o costs ~$0.03 per 1M input tokens and ~$0.12 per 1M output tokens (April 2026 pricing). A 1,000 token input + 500 token output costs roughly $0.00004. Test with small max_tokens values first to control spend while developing.

Rate limits

Free trial accounts are limited to 3 requests per minute. Paid accounts start at 3,500 requests per minute. If you hit a 429 status code, wait 30 seconds and retry. For high-volume applications, implement exponential backoff.

Common gotcha

Accessing the response incorrectly. Beginners write response.message.content instead of response.choices[0].message.content. The response is a wrapper object; the actual message is inside the choices list at index 0.

Error recovery

AuthenticationError

Your API key is missing, invalid, or expired. Verify <code>echo $OPENAI_API_KEY</code> returns a key starting with 'sk-'. Regenerate your key in the OpenAI dashboard if needed.

RateLimitError

You've exceeded your request quota. Wait 30+ seconds, then implement exponential backoff with jitter for retries. Check your rate limit tier in account settings.

APIConnectionError

Network issue or OpenAI API is down. Check your internet connection and https://status.openai.com/. Implement retry logic with exponential backoff.

BadRequestError

Invalid parameter value (e.g., model name doesn't exist, messages list is empty, max_tokens exceeds 128k). Check parameter names and types against the documentation.

Experienced dev note

Cache your system prompt and conversation history efficiently. Every character costs money. Use temperature=0 for deterministic outputs (classification, extraction) and temperature=1.0+ for creative tasks. Store responses and implement request deduplication: if the same user asks the same question twice, return your cached response instead of calling the API again.

Check your understanding

If you increase max_tokens from 100 to 500 but your model keeps finishing with finish_reason='length', what does that tell you about your input, and how would you fix it?

Show answer hint

finish_reason='length' means the model hit max_tokens before reaching a natural stop. The issue isn't your input: it's that your limit is too low. Either increase max_tokens or accept partial responses. If costs are a concern, try a shorter input prompt to reduce token usage.

VERSION openai 1.x SDK only. Do not use deprecated patterns like openai.ChatCompletion.create() or openai.api_key = 'sk-...' from 0.x versions. This course uses 1.x exclusively.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.