How to use GPT-4o audio
Quick answer
Use the
gpt-4o model with the OpenAI API's audio endpoints to transcribe audio or generate audio responses. The client.audio.transcriptions.create method handles speech-to-text, while client.chat.completions.create supports audio input and output in chat. Set your API key in os.environ and use the official OpenAI SDK v1+ for best results.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable for secure authentication.
pip install openai>=1.0 Step by step
This example shows how to transcribe an audio file using gpt-4o and then generate a chat response with audio input.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Transcribe audio file (speech-to-text)
with open("audio_sample.mp3", "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="gpt-4o",
file=audio_file
)
print("Transcription:", transcription.text)
# Use transcription text as input to chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": transcription.text}]
)
print("Chat response:", response.choices[0].message.content) output
Transcription: Hello, this is a test audio file. Chat response: Hi! How can I assist you today?
Common variations
- Use
client.audio.translations.createto translate audio to English text. - For streaming transcription, use the streaming parameter if supported by the SDK.
- Use other models like
gpt-4o-minifor smaller audio tasks. - Async usage is possible with async SDK clients.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - Ensure your audio file format is supported (mp3, wav, m4a, etc.) and under 25MB for API calls.
- Check for network issues if requests time out.
- Use the latest OpenAI SDK version to avoid deprecated method errors.
Key Takeaways
- Use
client.audio.transcriptions.createwithgpt-4oto convert speech to text. - Feed transcribed text into
client.chat.completions.createfor conversational AI with audio input. - Always set your API key securely via environment variables and use the latest OpenAI SDK v1+.
- Supported audio formats include mp3, wav, and m4a with size limits around 25MB for API calls.
- For advanced use, explore streaming and translation audio endpoints with appropriate models.