How to beginner · 3 min read

How to use GPT-4o audio

Q: How to use GPT-4o audio

Use the gpt-4o model with the OpenAI API's audio endpoints to transcribe audio or generate audio responses. The client.audio.transcriptions.create method handles speech-to-text, while client.chat.completions.create supports audio input and output in chat. Set your API key in os.environ and use the official OpenAI SDK v1+ for best results.

Quick answer

Use the gpt-4o model with the OpenAI API's audio endpoints to transcribe audio or generate audio responses. The client.audio.transcriptions.create method handles speech-to-text, while client.chat.completions.create supports audio input and output in chat. Set your API key in os.environ and use the official OpenAI SDK v1+ for best results.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable for secure authentication.

bash

pip install openai>=1.0

Step by step

This example shows how to transcribe an audio file using gpt-4o and then generate a chat response with audio input.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Transcribe audio file (speech-to-text)
with open("audio_sample.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="gpt-4o",
        file=audio_file
    )
print("Transcription:", transcription.text)

# Use transcription text as input to chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": transcription.text}]
)
print("Chat response:", response.choices[0].message.content)

output

Transcription: Hello, this is a test audio file.
Chat response: Hi! How can I assist you today?

Common variations

Use client.audio.translations.create to translate audio to English text.
For streaming transcription, use the streaming parameter if supported by the SDK.
Use other models like gpt-4o-mini for smaller audio tasks.
Async usage is possible with async SDK clients.

Troubleshooting

If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
Ensure your audio file format is supported (mp3, wav, m4a, etc.) and under 25MB for API calls.
Check for network issues if requests time out.
Use the latest OpenAI SDK version to avoid deprecated method errors.

✅

Key Takeaways

Use client.audio.transcriptions.create with gpt-4o to convert speech to text.
Feed transcribed text into client.chat.completions.create for conversational AI with audio input.
Always set your API key securely via environment variables and use the latest OpenAI SDK v1+.
Supported audio formats include mp3, wav, and m4a with size limits around 25MB for API calls.
For advanced use, explore streaming and translation audio endpoints with appropriate models.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗