The response object: what it contains
Why this matters
You can't use the API effectively without knowing where the actual answer lives in the response, how many tokens you burned, or what model version actually ran. Missing fields cause silent bugs.
Explanation
What the response object contains: When you call client.chat.completions.create(), you get back a ChatCompletion object. This isn't a dictionary: it's a Pydantic model with typed fields. It contains: the actual completion text (inside choices), token usage counts, the model that ran, finish reason, and timestamps. How it works: The OpenAI API sends back JSON from the server. The Python SDK automatically parses it into a ChatCompletion object with dot-notation access. The choices field is a list because the API can generate multiple completions in one request (controlled by the n parameter). The usage field tells you prompt tokens, completion tokens, and total tokens consumed. When to use it: Always capture the full response object, even if you only need the text right now. You'll need usage data for cost tracking, finish_reason to detect truncation, and model to verify the right version ran.
Request code
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
response = client.chat.completions.create(
model='gpt-4.1',
messages=[
{'role': 'user', 'content': 'What is 2+2?'}
]
)
print('Completion text:', response.choices[0].message.content)
print('Tokens used:', response.usage.total_tokens)
print('Finish reason:', response.choices[0].finish_reason)
print('Model:', response.model)
print('Response ID:', response.id) Authentication
Set your API key before instantiating the client. The OpenAI SDK reads the OPENAI_API_KEY environment variable automatically, or you can pass it explicitly: client = OpenAI(api_key='sk-...'). Test your key works by making one request before writing production code.
Response shape
| Field | Description |
|---|---|
id | string: unique identifier for this completion (e.g., 'chatcmpl-8a9...') |
object | string: always 'chat.completion' |
created | integer: Unix timestamp when response was generated |
model | string: name of the model that was used (e.g., 'gpt-4.1') |
choices | list of choice objects, each containing: message (with role and content), finish_reason, and index |
choices[0].message.content | string: the actual text response from the model |
choices[0].finish_reason | string: 'stop' (normal end), 'length' (hit max_tokens), 'tool_calls' (called a function), or 'content_filter' (blocked) |
usage | object containing prompt_tokens, completion_tokens, total_tokens |
usage.prompt_tokens | integer: tokens in your input |
usage.completion_tokens | integer: tokens in the model's output |
usage.total_tokens | integer: sum of prompt and completion tokens |
Field guide
choices[0].message.content This is what you actually want: the model's answer. Always check that choices is not empty and access [0] for the first (usually only) completion.
finish_reason Tells you WHY the response ended. 'stop' means normal completion. 'length' means you hit max_tokens and the answer was cut off: increase max_tokens or split the request. 'content_filter' means OpenAI blocked it for policy reasons.
usage.total_tokens Multiply by your model's price per token to get actual cost. Store this for billing reconciliation. OpenAI's estimates in docs may differ slightly from actual: this field is ground truth.
model Confirms which model actually ran. If you request 'gpt-4.1' but get 'gpt-4-turbo' back, something's wrong. Use this in logging to debug routing issues.
id Include this in error reports or support tickets. OpenAI uses it to look up your exact request in their logs.
Setup trap
Setting OPENAI_API_KEY in your code before instantiating OpenAI() does work: the SDK reads the environment at init time. The actual gotcha: if you set it after OpenAI() is called, it's already too late. Initialize in order: environment variable first, then instantiate client. Also: if you're running in a container, make sure secrets are passed at runtime, not baked into the image.
Cost
Each token costs money. Input tokens are usually 0.5x to 1x the price of output tokens. A single ChatCompletion request with usage tracking lets you bill users accurately. For gpt-4.1: ~0.03 USD per 1K input tokens, ~0.06 per 1K output tokens. Track usage.total_tokens and multiply by your model's rate. Don't estimate: use the actual field.
Rate limits
OpenAI enforces rate limits per minute and per day. If you're making many requests rapidly (e.g., batch processing 10K documents), you'll hit the per-minute limit. The SDK will raise a RateLimitError. Response object itself doesn't include rate limit headers in the 1.x SDK: you'd need to catch the exception and implement exponential backoff separately.
Common gotcha
Accessing response.choices[0].message.content without checking if choices is empty. If the API fails silently or returns an unexpected response structure, this throws an IndexError that's cryptic. Always check: if response.choices: text = response.choices[0].message.content.
Error recovery
APIConnectionErrorAuthenticationErrorRateLimitErrorAPIErrorValueErrorExperienced dev note
Log the entire response.id with every completion in production. When a user says 'your answer was wrong' or 'I got charged twice', you can grep your logs for that response ID and cross-reference it with OpenAI's billing. Also: response.model tells you which version actually ran: crucial for debugging. If you A/B test models, this field proves which one the user got, preventing blame-shifting. One more: finish_reason='length' is a silent failure mode. Set max_tokens high enough and always check finish_reason in monitoring. Truncated outputs look plausible but wrong.
Check your understanding
You're calling the API and get back a response. The finish_reason is 'length'. Your code extracted response.choices[0].message.content successfully. Should you ship this response to the user? Why or why not?
Show answer hint
finish_reason='length' means the model hit max_tokens and the answer was cut off mid-sentence. It's incomplete. The text may look grammatically correct but semantically wrong. You should either increase max_tokens and retry, or inform the user that the response was truncated.