Code beginner · 3 min read

How to use OpenAI Responses API in Python

Q: How to use OpenAI Responses API in Python

Use the OpenAI Python SDK v1 to create a client with your API key and call client.chat.completions.create() with the model and messages parameters to get responses.

Direct answer

Use the OpenAI Python SDK v1 to create a client with your API key and call client.chat.completions.create() with the model and messages parameters to get responses.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from openai import OpenAI

Examples

inHello, how are you?

outI'm doing great, thank you! How can I assist you today?

inExplain the benefits of using the OpenAI Responses API.

outThe OpenAI Responses API provides easy access to powerful language models for generating text, enabling natural language understanding and generation in your applications.

outPlease provide a prompt or message to get a response.

Integration steps

Import the OpenAI SDK and initialize the client with your API key from os.environ.
Prepare the messages list with roles and content for the chat completion.
Call client.chat.completions.create() with the model and messages parameters.
Extract the response text from response.choices[0].message.content.
Use or display the generated response as needed in your application.

Full code

python

import os
from openai import OpenAI

def main():
    # Initialize client with API key from environment
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    # Prepare messages for chat completion
    messages = [
        {"role": "user", "content": "Hello, how are you?"}
    ]

    # Call the chat completions endpoint
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )

    # Extract and print the response text
    text = response.choices[0].message.content
    print("Response:", text)

if __name__ == "__main__":
    main()

output

Response: I'm doing great, thank you! How can I assist you today?

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, how are you?"}]}

Response

json

{"choices": [{"message": {"content": "I'm doing great, thank you! How can I assist you today?"}}], "usage": {"total_tokens": 15}}

Extractresponse.choices[0].message.content

Variants

Streaming response ›

Use streaming to display partial results in real-time for better user experience with long responses.

python

import os
from openai import OpenAI

def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Tell me a story."}]

    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    print("Streaming response:")
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
    print()

if __name__ == "__main__":
    main()

Async version ›

Use async calls when integrating into asynchronous applications or frameworks for concurrency.

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain AI."}]

    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages
    )

    print("Async response:", response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())

Alternative model (gpt-4o-mini) ›

Use smaller models like gpt-4o-mini for lower latency and cost when high accuracy is not critical.

python

import os
from openai import OpenAI

def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Summarize the benefits of AI."}]

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

    print("Response from gpt-4o-mini:", response.choices[0].message.content)

if __name__ == "__main__":
    main()

Performance

Latency~800ms for <code>gpt-4o</code> non-streaming calls

Cost~$0.002 per 500 tokens for <code>gpt-4o</code>

Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute

Keep prompts concise to reduce token usage.
Use smaller models like <code>gpt-4o-mini</code> for cheaper calls.
Cache frequent responses to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard call (gpt-4o)	~800ms	~$0.002 per 500 tokens	High-quality completions
Streaming call	Starts immediately, total ~800ms	Same as standard	Real-time UI updates
Async call	~800ms (concurrent)	~$0.002 per 500 tokens	Concurrent applications
Smaller model (gpt-4o-mini)	~400ms	~$0.0005 per 500 tokens	Cost-sensitive or low-latency apps

✓

Quick tip

Always extract the response text from <code>response.choices[0].message.content</code> to get the generated answer.

⚠

Common mistake

Beginners often forget to pass the <code>messages</code> parameter as a list of role-content dicts, causing API errors.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗