How to beginner · 3 min read

OpenAI Responses API pricing

Quick answer
The OpenAI Responses API pricing is based on the number of tokens processed, with different rates per model. For example, gpt-4o costs more per 1,000 tokens than smaller models like gpt-4o-mini. Always check the official OpenAI pricing page for the latest rates.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable.

bash
pip install openai>=1.0

Step by step

Use the OpenAI SDK to create a chat completion and monitor token usage to estimate costs.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, what is the pricing for Responses API?"}]
)

print("Response:", response.choices[0].message.content)
print("Prompt tokens used:", response.usage.prompt_tokens)
print("Completion tokens used:", response.usage.completion_tokens)
print("Total tokens used:", response.usage.total_tokens)
output
Response: The OpenAI Responses API pricing depends on tokens used.
Prompt tokens used: 15
Completion tokens used: 45
Total tokens used: 60

Common variations

You can use different models like gpt-4o-mini for lower cost or enable streaming for real-time token generation. Async usage is also supported.

python
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Stream the response pricing info."}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_chat())
output
The OpenAI Responses API pricing is based on tokens used, with smaller models costing less per 1,000 tokens.

Troubleshooting

  • If you see unexpected high token usage, check your prompt length and model choice.
  • Ensure your API key is set correctly in OPENAI_API_KEY.
  • For billing questions, consult the official OpenAI pricing page.

Key Takeaways

  • OpenAI Responses API pricing is token-based and varies by model.
  • Use smaller models like gpt-4o-mini to reduce costs.
  • Monitor token usage in API responses to manage your budget.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗