Code beginner · 3 min read

How to use OpenAI Responses API in Python

Direct answer
Use the OpenAI Python SDK v1 to create a client with your API key and call client.chat.completions.create() with the model and messages parameters to get responses.

Setup

Install
bash
pip install openai
Env vars
OPENAI_API_KEY
Imports
python
import os
from openai import OpenAI

Examples

inHello, how are you?
outI'm doing great, thank you! How can I assist you today?
inExplain the benefits of using the OpenAI Responses API.
outThe OpenAI Responses API provides easy access to powerful language models for generating text, enabling natural language understanding and generation in your applications.
in
outPlease provide a prompt or message to get a response.

Integration steps

  1. Import the OpenAI SDK and initialize the client with your API key from os.environ.
  2. Prepare the messages list with roles and content for the chat completion.
  3. Call client.chat.completions.create() with the model and messages parameters.
  4. Extract the response text from response.choices[0].message.content.
  5. Use or display the generated response as needed in your application.

Full code

python
import os
from openai import OpenAI

def main():
    # Initialize client with API key from environment
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    # Prepare messages for chat completion
    messages = [
        {"role": "user", "content": "Hello, how are you?"}
    ]

    # Call the chat completions endpoint
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )

    # Extract and print the response text
    text = response.choices[0].message.content
    print("Response:", text)

if __name__ == "__main__":
    main()
output
Response: I'm doing great, thank you! How can I assist you today?

API trace

Request
json
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, how are you?"}]}
Response
json
{"choices": [{"message": {"content": "I'm doing great, thank you! How can I assist you today?"}}], "usage": {"total_tokens": 15}}
Extractresponse.choices[0].message.content

Variants

Streaming response

Use streaming to display partial results in real-time for better user experience with long responses.

python
import os
from openai import OpenAI

def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Tell me a story."}]

    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    print("Streaming response:")
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
    print()

if __name__ == "__main__":
    main()
Async version

Use async calls when integrating into asynchronous applications or frameworks for concurrency.

python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain AI."}]

    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages
    )

    print("Async response:", response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())
Alternative model (gpt-4o-mini)

Use smaller models like gpt-4o-mini for lower latency and cost when high accuracy is not critical.

python
import os
from openai import OpenAI

def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Summarize the benefits of AI."}]

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

    print("Response from gpt-4o-mini:", response.choices[0].message.content)

if __name__ == "__main__":
    main()

Performance

Latency~800ms for <code>gpt-4o</code> non-streaming calls
Cost~$0.002 per 500 tokens for <code>gpt-4o</code>
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
  • Keep prompts concise to reduce token usage.
  • Use smaller models like <code>gpt-4o-mini</code> for cheaper calls.
  • Cache frequent responses to avoid repeated calls.
ApproachLatencyCost/callBest for
Standard call (gpt-4o)~800ms~$0.002 per 500 tokensHigh-quality completions
Streaming callStarts immediately, total ~800msSame as standardReal-time UI updates
Async call~800ms (concurrent)~$0.002 per 500 tokensConcurrent applications
Smaller model (gpt-4o-mini)~400ms~$0.0005 per 500 tokensCost-sensitive or low-latency apps

Quick tip

Always extract the response text from <code>response.choices[0].message.content</code> to get the generated answer.

Common mistake

Beginners often forget to pass the <code>messages</code> parameter as a list of role-content dicts, causing API errors.

Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗