How to use Whisper API in python
Direct answer
Use the OpenAI Python SDK's
client.audio.transcriptions.create method with your audio file and model set to whisper-1 to transcribe audio in Python.Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
import os
from openai import OpenAI Examples
inAudio file: 'speech.mp3' (English speech)
outTranscription text: "Hello, this is a test of the Whisper API."
inAudio file: 'interview.wav' (Interview in English)
outTranscription text: "Today we discuss the future of AI and technology."
inAudio file: 'spanish_audio.mp3' (Spanish speech)
outTranscription text: "Hola, esta es una prueba de la API Whisper."
Integration steps
- Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
- Import the OpenAI client and initialize it with your API key from os.environ.
- Open your audio file in binary mode.
- Call
client.audio.transcriptions.createwith the file, model='whisper-1', and optionally specify language. - Extract the transcription text from the response's 'text' field.
- Use or display the transcription as needed.
Full code
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Path to your audio file
audio_file_path = "speech.mp3"
with open(audio_file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
file=audio_file,
model="whisper-1"
)
print("Transcription:", transcription.text) output
Transcription: Hello, this is a test of the Whisper API.
API trace
Request
{"model": "whisper-1", "file": <binary audio data>} Response
{"text": "Hello, this is a test of the Whisper API."} Extract
transcription.textVariants
Specify language for better accuracy ›
Use when you know the audio language to improve transcription accuracy.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with open("spanish_audio.mp3", "rb") as audio_file:
transcription = client.audio.transcriptions.create(
file=audio_file,
model="whisper-1",
language="es"
)
print("Transcription:", transcription.text) Use Whisper API for translation ›
Use when you want to translate audio speech to English instead of just transcribing.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with open("french_audio.mp3", "rb") as audio_file:
translation = client.audio.translations.create(
file=audio_file,
model="whisper-1"
)
print("Translation:", translation.text) Async transcription call ›
Use async version for concurrent transcription calls in async applications.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def transcribe():
with open("speech.mp3", "rb") as audio_file:
transcription = await client.audio.transcriptions.acreate(
file=audio_file,
model="whisper-1"
)
print("Transcription:", transcription.text)
asyncio.run(transcribe()) Performance
Latency~2-5 seconds per minute of audio depending on file size and network
Cost~$0.006 per minute of audio processed with Whisper API
Rate limitsDefault tier: 60 requests per minute, check OpenAI docs for updates
- Use compressed audio formats like mp3 or m4a to reduce upload size.
- Trim silence or irrelevant parts before sending audio to reduce cost.
- Specify language to avoid extra processing and improve speed.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard transcription | ~2-5s per minute | ~$0.006/min | General audio transcription |
| Translation endpoint | ~3-6s per minute | ~$0.006/min | Transcribing and translating non-English audio |
| Async transcription | Varies, concurrent calls | ~$0.006/min | High throughput or async apps |
Quick tip
Always specify the audio language in <code>language</code> parameter to improve Whisper transcription accuracy.
Common mistake
Beginners often forget to open the audio file in binary mode ('rb'), causing the API call to fail.
Community Notes
No notes yetBe the first to share a version-specific fix or tip.