How to use Whisper API in Python
Direct answer
Use the
OpenAI Python SDK to call client.audio.transcriptions.create with model="whisper-1" and an audio file to transcribe audio using the Whisper API.Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
from openai import OpenAI
import os Examples
inTranscribe a short mp3 audio file 'speech.mp3'
outTranscription text: "Hello, this is a test of the Whisper API."
inTranscribe a wav file 'meeting.wav' with clear speech
outTranscription text: "Today’s meeting covered project milestones and deadlines."
inTranscribe a noisy audio file 'noisy_audio.mp3'
outTranscription text: "Despite background noise, the main points were captured accurately."
Integration steps
- Install the OpenAI Python SDK and set the OPENAI_API_KEY environment variable.
- Import the OpenAI client and initialize it with the API key from environment variables.
- Open the audio file in binary mode for reading.
- Call
client.audio.transcriptions.createwithmodel="whisper-1"and the audio file object. - Extract the transcription text from the response's
textfield. - Print or use the transcribed text as needed.
Full code
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Replace 'audio.mp3' with your audio file path
with open("audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print("Transcription text:", transcript.text) output
Transcription text: Hello, this is a test of the Whisper API.
API trace
Request
{"model": "whisper-1", "file": <binary audio file>} Response
{"text": "Transcribed text string", "language": "en", "duration": 12.3} Extract
response.textVariants
Async version ›
Use when integrating Whisper transcription in asynchronous Python applications for concurrency.
import asyncio
from openai import OpenAI
import os
async def transcribe_async():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with open("audio.mp3", "rb") as audio_file:
transcript = await client.audio.transcriptions.acreate(
model="whisper-1",
file=audio_file
)
print("Transcription text:", transcript.text)
asyncio.run(transcribe_async()) Local Whisper transcription (offline) ›
Use when you want to transcribe audio locally without API calls or internet dependency.
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print("Transcription text:", result["text"]) Specify language parameter ›
Use when you know the audio language in advance to improve transcription accuracy.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with open("audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="en"
)
print("Transcription text:", transcript.text) Performance
Latency~3-10 seconds per minute of audio depending on file size and network speed
Cost~$0.006 per minute of audio for Whisper API transcription
Rate limitsDefault tier: 60 requests per minute, 1000 minutes per month (check OpenAI docs for updates)
- Trim audio to only the needed segments to reduce cost and latency.
- Use the language parameter if known to improve transcription speed and accuracy.
- Avoid re-uploading the same audio multiple times; cache transcripts when possible.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard Whisper API call | ~3-10s per audio minute | ~$0.006/min | Reliable cloud transcription with minimal setup |
| Async Whisper API call | ~3-10s per audio minute | ~$0.006/min | Concurrent transcription in async apps |
| Local Whisper model | Varies by hardware (seconds to minutes) | Free (local compute cost) | Offline transcription without API dependency |
Quick tip
Always open your audio file in binary mode ('rb') before passing it to <code>client.audio.transcriptions.create</code> to avoid file read errors.
Common mistake
Forgetting to open the audio file in binary mode or passing a file path string instead of a file object causes API errors.