How to Intermediate · 3 min read

Structured output token overhead optimization

Quick answer
To optimize token overhead in structured outputs, use concise prompt templates and leverage functions or JSON schema output features where supported. This reduces unnecessary tokens by guiding the model to produce minimal, well-structured responses.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official openai Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai

Step by step

Use the functions parameter in client.chat.completions.create to define a JSON schema for structured output. This guides the model to respond with minimal tokens strictly matching the schema, reducing overhead.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

functions = [
    {
        "name": "get_user_info",
        "description": "Get user information in a structured format",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "User's full name"},
                "age": {"type": "integer", "description": "User's age"},
                "email": {"type": "string", "description": "User's email address"}
            },
            "required": ["name", "age", "email"]
        }
    }
]

messages = [
    {"role": "user", "content": "Provide user info for John Doe, 30 years old, john@example.com."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    functions=functions,
    function_call={"name": "get_user_info"}  # force structured output
)

structured_output = response.choices[0].message.function_call.arguments
print(structured_output)
output
{
  "name": "John Doe",
  "age": 30,
  "email": "john@example.com"
}

Common variations

You can optimize token overhead further by:

  • Using shorter system prompts to reduce context tokens.
  • Choosing smaller models like gpt-4o-mini for less verbose outputs.
  • Using async calls with the OpenAI SDK for concurrent requests.
  • Applying similar structured output techniques with other providers like Anthropic's claude-3-5-sonnet-20241022 using their system and messages parameters.
python
import os
from openai import OpenAI
import asyncio

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def get_structured_output():
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Provide user info for Jane, 25, jane@example.com."}],
        functions=[
            {
                "name": "get_user_info",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "age": {"type": "integer"},
                        "email": {"type": "string"}
                    },
                    "required": ["name", "age", "email"]
                }
            }
        ],
        function_call={"name": "get_user_info"}
    )
    print(response.choices[0].message.function_call.arguments)

asyncio.run(get_structured_output())
output
{
  "name": "Jane",
  "age": 25,
  "email": "jane@example.com"
}

Troubleshooting

If the model returns verbose or unstructured output despite using functions, verify:

  • Your function_call parameter is correctly set to the function name to enforce structured output.
  • The JSON schema in functions is valid and complete.
  • You are using a model version that supports functions (e.g., gpt-4o or later).
  • Check for API errors or rate limits in your response metadata.

Key Takeaways

  • Use the functions parameter with JSON schema to enforce minimal structured output and reduce token overhead.
  • Force structured output with function_call to avoid verbose or unstructured responses.
  • Choose concise prompts and smaller models to further optimize token usage and cost.
  • Async calls enable efficient batch processing without increasing token overhead per request.
  • Validate your JSON schema and model compatibility to ensure structured output works as expected.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗