How to beginner · 3 min read

How to run Whisper on GPU

Q: How to run Whisper on GPU

To run Whisper on GPU, use the openai-whisper Python package or whisper library with a CUDA-enabled GPU. Load the model with device="cuda" to leverage GPU acceleration for faster transcription.

Quick answer

To run Whisper on GPU, use the openai-whisper Python package or whisper library with a CUDA-enabled GPU. Load the model with device="cuda" to leverage GPU acceleration for faster transcription.

PREREQUISITES

Python 3.8+
CUDA-enabled GPU with appropriate drivers
pip install openai-whisper or pip install whisper
PyTorch with CUDA support installed

Setup

Install the whisper package and ensure you have PyTorch with CUDA support installed. Verify your GPU is available for PyTorch.

bash

pip install -U openai-whisper torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121

Step by step

Use the following Python code to load the Whisper model on GPU and transcribe an audio file.

python

import whisper

# Load the model on GPU
model = whisper.load_model("base", device="cuda")

# Transcribe audio file
result = model.transcribe("audio.mp3")

print("Transcription:", result["text"])

output

Transcription: This is the transcribed text from the audio file.

Common variations

Use different model sizes like tiny, small, medium, or large for speed vs accuracy trade-offs.
Run asynchronously with asyncio if integrating into async apps.
Use faster-whisper for optimized GPU inference with lower latency.

python

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cuda")
segments, info = model.transcribe("audio.mp3")
for segment in segments:
    print(segment.text)

output

Printed transcribed segments from the audio file.

Troubleshooting

If you get CUDA out of memory, try a smaller model or reduce batch size.
Ensure PyTorch is installed with CUDA support by running torch.cuda.is_available() which should return True.
Update GPU drivers and CUDA toolkit if device is not detected.

python

import torch
print(torch.cuda.is_available())  # Should print True if GPU is ready

output

True

✅

Key Takeaways

Use device="cuda" when loading Whisper model to enable GPU acceleration.
Install PyTorch with CUDA support matching your GPU and CUDA version.
Choose model size based on your speed and accuracy needs to optimize GPU memory usage.

Verified 2026-04 · whisper, faster-whisper

Verify ↗