How to beginner · 3 min read

OpenAI Responses API pricing

Q: OpenAI Responses API pricing

The OpenAI Responses API pricing is based on the number of tokens processed, with different rates per model. For example, gpt-4o costs more per 1,000 tokens than smaller models like gpt-4o-mini. Always check the official OpenAI pricing page for the latest rates.

Quick answer

The OpenAI Responses API pricing is based on the number of tokens processed, with different rates per model. For example, gpt-4o costs more per 1,000 tokens than smaller models like gpt-4o-mini. Always check the official OpenAI pricing page for the latest rates.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

Use the OpenAI SDK to create a chat completion and monitor token usage to estimate costs.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, what is the pricing for Responses API?"}]
)

print("Response:", response.choices[0].message.content)
print("Prompt tokens used:", response.usage.prompt_tokens)
print("Completion tokens used:", response.usage.completion_tokens)
print("Total tokens used:", response.usage.total_tokens)

output

Response: The OpenAI Responses API pricing depends on tokens used.
Prompt tokens used: 15
Completion tokens used: 45
Total tokens used: 60

Common variations

You can use different models like gpt-4o-mini for lower cost or enable streaming for real-time token generation. Async usage is also supported.

python

import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Stream the response pricing info."}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_chat())

output

The OpenAI Responses API pricing is based on tokens used, with smaller models costing less per 1,000 tokens.

Troubleshooting

If you see unexpected high token usage, check your prompt length and model choice.
Ensure your API key is set correctly in OPENAI_API_KEY.
For billing questions, consult the official OpenAI pricing page.

Key Takeaways

OpenAI Responses API pricing is token-based and varies by model.
Use smaller models like gpt-4o-mini to reduce costs.
Monitor token usage in API responses to manage your budget.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.