How to choose Whisper model size
Quick answer
Choose a Whisper model size based on your accuracy needs and available compute resources. Larger models like
whisper-large provide higher transcription accuracy but require more memory and CPU/GPU power, while smaller models like whisper-small or whisper-base offer faster inference with lower resource use but less accuracy.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python package to access Whisper models via the API. Set your OPENAI_API_KEY environment variable for authentication.
pip install openai Step by step
Use the OpenAI API to transcribe audio with a chosen Whisper model size. The example below uses whisper-1, the standard model, but you can swap in smaller or larger variants if available.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with open("audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print("Transcription:", transcript.text) output
Transcription: Hello, this is a sample audio transcription.
Common variations
Whisper models vary by size: tiny, base, small, medium, and large. Smaller models run faster and use less memory but have lower accuracy. Larger models improve transcription quality, especially on noisy or accented audio, but require more compute.
Use local open-source Whisper models with openai-whisper or whisper.cpp for offline use and control over model size.
| Model size | Accuracy | Speed | Memory usage |
|---|---|---|---|
| tiny | Lowest | Fastest | Lowest |
| base | Low | Fast | Low |
| small | Moderate | Moderate | Moderate |
| medium | High | Slower | High |
| large | Highest | Slowest | Highest |
Troubleshooting
- If transcription is inaccurate, try a larger model size or improve audio quality.
- If you encounter memory errors, switch to a smaller model or use streaming transcription.
- For slow inference, consider running smaller models locally or using GPU acceleration.
Key Takeaways
- Select Whisper model size balancing accuracy and resource constraints.
- Use larger models for noisy or complex audio, smaller for speed and low resource use.
- OpenAI API uses
whisper-1as the default production model. - Local open-source Whisper allows explicit model size choice for offline use.
- Test different sizes on your audio to find the best tradeoff.