How to use faster-whisper in Python
Direct answer
Use the faster_whisper Python package to load a Whisper model and transcribe audio files with efficient CPU or GPU support by calling model.transcribe().
Setup
Install
pip install faster-whisper Imports
from faster_whisper import WhisperModel
import torch Examples
inTranscribe a short English audio file 'speech.mp3'
outTranscription text: "Hello, this is a test of faster-whisper."
inTranscribe a long podcast audio with GPU acceleration
outTranscription text: "Welcome to the podcast episode number 42..."
inTranscribe a noisy audio file with English language specified
outTranscription text: "Despite background noise, the speech is clear."
Integration steps
- Install the faster-whisper package via pip.
- Import WhisperModel from faster_whisper.
- Initialize the WhisperModel with the desired model size and device (cpu or cuda).
- Call the model's transcribe method with the audio file path and optional parameters.
- Process the returned segments to extract the full transcription text.
Full code
from faster_whisper import WhisperModel
# Initialize model (use 'small' or 'base' for faster speed)
model = WhisperModel("small", device="cpu")
# Transcribe audio file
segments, info = model.transcribe("speech.mp3", beam_size=5)
# Combine segments into full transcription
transcription = "".join(segment.text for segment in segments)
print("Transcription text:", transcription) output
Transcription text: Hello, this is a test of faster-whisper.
API trace
Request
{"model_size": "small", "device": "cpu", "audio_path": "speech.mp3", "beam_size": 5} Response
{"segments": [{"start": float, "end": float, "text": string}], "info": {"language": string, "duration": float}} Extract
"".join(segment.text for segment in segments)Variants
GPU accelerated transcription ›
Use when you have a CUDA-enabled GPU for faster transcription of longer audio files.
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cuda")
segments, info = model.transcribe("podcast.mp3", beam_size=5)
transcription = "".join(segment.text for segment in segments)
print("Transcription text:", transcription) Streaming transcription (real-time segments) ›
Use to process and display transcription segments as they are decoded for real-time feedback.
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cpu")
for segment in model.transcribe("speech.mp3", beam_size=5, word_timestamps=True)[0]:
print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}") Specify language for improved accuracy ›
Use when you know the audio language in advance to improve transcription accuracy.
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cpu")
segments, info = model.transcribe("speech.mp3", language="en", beam_size=5)
transcription = "".join(segment.text for segment in segments)
print("Transcription text:", transcription) Performance
Latency~1-3 seconds per minute of audio on CPU small model; ~0.3-0.5 seconds per minute on GPU base model
CostOpen-source and free; no API costs
Rate limitsNone, fully local execution
- Use smaller model sizes like 'small' or 'base' for faster inference.
- Specify the language to reduce decoding complexity.
- Use beam_size=1 for faster but less accurate transcription.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| CPU small model | ~1-3s per minute audio | Free (local) | Low-resource machines, quick tests |
| GPU base model | ~0.3-0.5s per minute audio | Free (local) | High throughput, longer audio |
| Streaming segments | Real-time segment output | Free (local) | Interactive transcription feedback |
Quick tip
Set device="cuda" if you have a GPU to drastically speed up transcription with faster-whisper.
Common mistake
Not specifying the device or using cpu on large models causes slow transcription times.