Code beginner · 3 min read

How to use OpenAI API in python

Direct answer
Use the openai Python SDK v1+ by importing OpenAI, initializing the client with your API key from os.environ, then call client.chat.completions.create with your model and messages.

Setup

Install
bash
pip install openai
Env vars
OPENAI_API_KEY
Imports
python
import os
from openai import OpenAI

Examples

inHello, how are you?
outI'm doing great, thank you! How can I assist you today?
inWrite a Python function to reverse a string.
outdef reverse_string(s): return s[::-1]
inExplain quantum computing in simple terms.
outQuantum computing uses quantum bits that can be in multiple states at once, enabling faster problem solving for certain tasks.

Integration steps

  1. Install the OpenAI Python SDK with pip.
  2. Set your API key in the environment variable OPENAI_API_KEY.
  3. Import OpenAI and initialize the client with the API key from os.environ.
  4. Create a messages list with roles and content for the chat completion.
  5. Call client.chat.completions.create with the model and messages.
  6. Extract the response text from response.choices[0].message.content.

Full code

python
import os
from openai import OpenAI

# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define chat messages
messages = [
    {"role": "user", "content": "Hello, how are you?"}
]

# Create chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

# Extract and print the assistant's reply
print("Assistant:", response.choices[0].message.content)
output
Assistant: I'm doing great, thank you! How can I assist you today?

API trace

Request
json
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, how are you?"}]}
Response
json
{"choices": [{"message": {"content": "I'm doing great, thank you! How can I assist you today?"}}], "usage": {"total_tokens": 15}}
Extractresponse.choices[0].message.content

Variants

Streaming Chat Completion

Use streaming to display partial responses in real-time for better user experience with long outputs.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Tell me a joke."}]

# Stream the response
for chunk in client.chat.completions.create(model="gpt-4o", messages=messages, stream=True):
    print(chunk.choices[0].delta.get('content', ''), end='')
Async Chat Completion

Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.

python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain recursion."}]
    response = await client.chat.completions.acreate(model="gpt-4o", messages=messages)
    print(response.choices[0].message.content)

asyncio.run(main())
Using a Smaller Model for Cost Efficiency

Use smaller models like gpt-4o-mini to reduce cost and latency when high precision is not critical.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Summarize the benefits of AI."}]

response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

Performance

Latency~800ms for gpt-4o non-streaming calls
Cost~$0.002 per 500 tokens exchanged with gpt-4o
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
  • Keep prompts concise to reduce token usage.
  • Use smaller models for less critical tasks.
  • Cache frequent queries to avoid repeated calls.
ApproachLatencyCost/callBest for
Standard Chat Completion~800ms~$0.002General purpose, reliable
Streaming Chat CompletionStarts immediately, ~800ms total~$0.002Real-time UI updates
Async Chat Completion~800ms~$0.002Concurrent requests in async apps
Smaller Model (gpt-4o-mini)~400ms~$0.0005Cost-sensitive or low-latency needs

Quick tip

Always load your API key securely from environment variables and never hardcode it in your source code.

Common mistake

Beginners often use deprecated SDK methods like openai.ChatCompletion.create() instead of the current client.chat.completions.create() pattern.

Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗