Extracting nested fields from JSON response
Why this matters
OpenAI responses contain nested structures (choices → message → content, usage → prompt_tokens) that developers often access incorrectly, leading to AttributeError or missed fields like finish_reason. Learning the correct access pattern prevents runtime crashes in production and helps you discover response fields that unlock functionality.
Explanation
What it does: OpenAI's Python SDK returns strongly-typed response objects (not raw dicts) where nested fields are accessed via dot notation. This differs from older requests-based code where you'd use response['choices'][0]['message']['content'].
How it works: The SDK uses Pydantic models under the hood. When you call client.chat.completions.create(), it returns a ChatCompletion object. Fields like choices, usage, and model are Pydantic model instances, not dicts. Accessing them via dot notation gives you type hints in your IDE and validation at instantiation time. If a field doesn't exist, you get an AttributeError immediately instead of a silent None.
When to use it: Always access nested response fields via dot notation (response.choices[0].message.content) rather than dict keys. Only convert to dict if you need JSON serialization or are passing data to systems that require dict inputs.
Request code
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
response = client.chat.completions.create(
model='gpt-4-turbo',
messages=[
{'role': 'user', 'content': 'What is 2+2?'}
]
)
message_content = response.choices[0].message.content
finish_reason = response.choices[0].finish_reason
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens
model_name = response.model
print(f'Content: {message_content}')
print(f'Finish reason: {finish_reason}')
print(f'Tokens - Prompt: {prompt_tokens}, Completion: {completion_tokens}, Total: {total_tokens}')
print(f'Model used: {model_name}') Authentication
Set your OpenAI API key via environment variable before instantiation: export OPENAI_API_KEY='sk-...' Or pass it explicitly to OpenAI(api_key='sk-...'). The SDK reads OPENAI_API_KEY at object construction time, not at import time.
Response shape
| Field | Description |
|---|---|
id | Unique identifier for this completion request (e.g., 'chatcmpl-8z9...') |
object | Always 'chat.completion' |
created | Unix timestamp when the response was generated |
model | Model name used (e.g., 'gpt-4-turbo') |
choices | List of completion choices. Length depends on n parameter |
choices[0] | First choice object |
choices[0].message | Message object containing role and content |
choices[0].message.content | The actual text response from the model |
choices[0].message.role | Always 'assistant' for API responses |
choices[0].finish_reason | Why generation stopped ('stop', 'length', 'tool_calls', 'content_filter') |
choices[0].index | Position in choices array |
usage | Token usage object |
usage.prompt_tokens | Tokens in your input message |
usage.completion_tokens | Tokens in the response |
usage.total_tokens | Sum of prompt and completion tokens |
usage.cache_creation_input_tokens | Tokens cached for prompt caching (if enabled) |
usage.cache_read_input_tokens | Tokens read from cache (if enabled) |
Field guide
choices[0].message.content The primary field you'll use: contains the model's text response. Always a string.
finish_reason Critical for production code. 'stop' means the model finished naturally. 'length' means max_tokens was hit (response is incomplete). 'content_filter' means the response was flagged. Never ignore this in logging.
usage.total_tokens Directly tied to cost. Multiply by your model's per-1M-token price. Cache hit tokens (cache_read_input_tokens) cost 90% less than regular prompt tokens.
choices Typically length 1, but becomes a list when n > 1. Always iterate or index safely to avoid IndexError.
Setup trap
Setting OPENAI_API_KEY in your Python code after importing OpenAI is too late. The client reads the key at OpenAI() instantiation. Set the environment variable before running your script, or pass api_key explicitly to OpenAI(api_key='...'). If you're writing a library, lazy-load the client or require callers to pass a pre-configured OpenAI instance.
Cost
Each API call costs based on input + output tokens. In the response, check usage.cache_read_input_tokens: these cost 90% less than regular prompt tokens. With prompt caching enabled, repeated requests to the same context (e.g., large system prompts) can save significant money. Monitor total_tokens in production; a single call can easily cost $0.01–$1.00 depending on model and length.
Rate limits
You'll hit rate limits (429 error) if you make >10,000 requests/min on free tier, or exceed token-per-minute limits on paid. Extract finish_reason == 'length' responses early to avoid wasted token spend on incomplete outputs. Implement exponential backoff for retries rather than immediate retry.
Common gotcha
Trying to access response['choices'][0]['message']['content'] like a dict will fail immediately with AttributeError. The SDK returns Pydantic model objects, not dicts. Use dot notation: response.choices[0].message.content. Your IDE autocomplete only works with dot notation, so you'll catch typos.
Error recovery
AttributeError: 'ChatCompletion' object has no attribute 'choices'IndexError: list index out of rangeAuthenticationErrorRateLimitErrorAPIConnectionErrorExperienced dev note
The finish_reason field is your canary in the coal mine. In production, log every response with finish_reason != 'stop' as a warning. If finish_reason == 'length', your max_tokens is too low and responses are truncated: users are getting incomplete answers. If it's 'content_filter', the model rejected the input or output; log it for compliance and UX debugging. Also: cache_read_input_tokens are invisible to inexperienced devs but can cut your token costs by 70% on repeated contexts. Always enable prompt caching for system prompts > 1KB that you use across multiple requests.
Check your understanding
You're building a chat app that streams responses. Why would you need to check finish_reason even though the stream completed? What could go wrong if you ignore it?
Show answer hint
finish_reason tells you whether the model finished naturally ('stop') or hit a limit ('length'). If it's 'length', the response is truncated mid-sentence, which users won't see if you don't handle streaming properly. You must check finish_reason after streaming completes to know if you need to request more tokens or alert the user.