Code beginner · 3 min read

How to use faster-whisper in Python

Direct answer

Use the faster_whisper Python package to load a Whisper model and transcribe audio files with efficient CPU or GPU support by calling model.transcribe().

Setup

Install

bash

pip install faster-whisper

Imports

python

from faster_whisper import WhisperModel
import torch

Examples

inTranscribe a short English audio file 'speech.mp3'

outTranscription text: "Hello, this is a test of faster-whisper."

inTranscribe a long podcast audio with GPU acceleration

outTranscription text: "Welcome to the podcast episode number 42..."

inTranscribe a noisy audio file with English language specified

outTranscription text: "Despite background noise, the speech is clear."

Integration steps

Install the faster-whisper package via pip.
Import WhisperModel from faster_whisper.
Initialize the WhisperModel with the desired model size and device (cpu or cuda).
Call the model's transcribe method with the audio file path and optional parameters.
Process the returned segments to extract the full transcription text.

Full code

python

from faster_whisper import WhisperModel

# Initialize model (use 'small' or 'base' for faster speed)
model = WhisperModel("small", device="cpu")

# Transcribe audio file
segments, info = model.transcribe("speech.mp3", beam_size=5)

# Combine segments into full transcription
transcription = "".join(segment.text for segment in segments)

print("Transcription text:", transcription)

output

Transcription text: Hello, this is a test of faster-whisper.

API trace

Request

json

{"model_size": "small", "device": "cpu", "audio_path": "speech.mp3", "beam_size": 5}

Response

json

{"segments": [{"start": float, "end": float, "text": string}], "info": {"language": string, "duration": float}}

Extract"".join(segment.text for segment in segments)

Variants

GPU accelerated transcription ›

Use when you have a CUDA-enabled GPU for faster transcription of longer audio files.

python

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cuda")
segments, info = model.transcribe("podcast.mp3", beam_size=5)
transcription = "".join(segment.text for segment in segments)
print("Transcription text:", transcription)

Streaming transcription (real-time segments) ›

Use to process and display transcription segments as they are decoded for real-time feedback.

python

from faster_whisper import WhisperModel

model = WhisperModel("small", device="cpu")
for segment in model.transcribe("speech.mp3", beam_size=5, word_timestamps=True)[0]:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}")

Specify language for improved accuracy ›

Use when you know the audio language in advance to improve transcription accuracy.

python

from faster_whisper import WhisperModel

model = WhisperModel("small", device="cpu")
segments, info = model.transcribe("speech.mp3", language="en", beam_size=5)
transcription = "".join(segment.text for segment in segments)
print("Transcription text:", transcription)

Performance

Latency~1-3 seconds per minute of audio on CPU small model; ~0.3-0.5 seconds per minute on GPU base model

CostOpen-source and free; no API costs

Rate limitsNone, fully local execution

Use smaller model sizes like 'small' or 'base' for faster inference.
Specify the language to reduce decoding complexity.
Use beam_size=1 for faster but less accurate transcription.

Approach	Latency	Cost/call	Best for
CPU small model	~1-3s per minute audio	Free (local)	Low-resource machines, quick tests
GPU base model	~0.3-0.5s per minute audio	Free (local)	High throughput, longer audio
Streaming segments	Real-time segment output	Free (local)	Interactive transcription feedback

✓

Quick tip

Set device="cuda" if you have a GPU to drastically speed up transcription with faster-whisper.

⚠

Common mistake

Not specifying the device or using cpu on large models causes slow transcription times.

Verified 2026-04 · small, base

Verify ↗