How to get response text from OpenAI API in python
Direct answer
Use the OpenAI Python SDK v1+ by calling
client.chat.completions.create() with your messages, then access the response text via response.choices[0].message.content.Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
import os
from openai import OpenAI Examples
inHello, how are you?
outI'm doing well, thank you! How can I assist you today?
inWrite a short poem about spring.
outSpring blooms anew, with colors bright and true, nature's gentle cue.
in
outPlease provide a prompt to generate a response.
Integration steps
- Import the OpenAI client and load your API key from environment variables.
- Initialize the OpenAI client with your API key.
- Create a messages list with the user prompt.
- Call
client.chat.completions.create()with the model and messages. - Extract the response text from
response.choices[0].message.content.
Full code
import os
from openai import OpenAI
# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define the user message
messages = [{"role": "user", "content": "Hello, how are you?"}]
# Create chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
# Extract and print the response text
text = response.choices[0].message.content
print("Response from OpenAI:", text) output
Response from OpenAI: I'm doing well, thank you! How can I assist you today?
API trace
Request
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, how are you?"}]} Response
{"choices": [{"message": {"content": "I'm doing well, thank you! How can I assist you today?"}}], "usage": {"total_tokens": 20}} Extract
response.choices[0].message.contentVariants
Streaming response ›
Use streaming to display the response token-by-token for better user experience with long outputs.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Tell me a story."}]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print() Async version ›
Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.
import os
import asyncio
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Explain quantum computing."}]
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=messages
)
print(response.choices[0].message.content)
asyncio.run(main()) Alternative model (gpt-4o-mini) ›
Use a smaller model like gpt-4o-mini for faster responses and lower cost when high detail is not required.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
print(response.choices[0].message.content) Performance
Latency~800ms for gpt-4o non-streaming
Cost~$0.002 per 500 tokens exchanged on gpt-4o
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
- Keep prompts concise to reduce token usage.
- Use smaller models like gpt-4o-mini for cheaper calls.
- Cache frequent responses to avoid repeated calls.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call | ~800ms | ~$0.002 | General purpose, simple integration |
| Streaming | Starts immediately, total ~800ms | ~$0.002 | Long responses with better UX |
| Async call | ~800ms | ~$0.002 | Concurrent requests in async apps |
Quick tip
Always extract the response text from <code>response.choices[0].message.content</code> to get the assistant's reply.
Common mistake
Beginners often forget to access <code>choices[0].message.content</code> and instead try to print the entire response object.