How to run Whisper on GPU
Quick answer
To run
Whisper on GPU, use the openai-whisper Python package or whisper library with a CUDA-enabled GPU. Load the model with device="cuda" to leverage GPU acceleration for faster transcription.PREREQUISITES
Python 3.8+CUDA-enabled GPU with appropriate driverspip install openai-whisper or pip install whisperPyTorch with CUDA support installed
Setup
Install the whisper package and ensure you have PyTorch with CUDA support installed. Verify your GPU is available for PyTorch.
pip install -U openai-whisper torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121 Step by step
Use the following Python code to load the Whisper model on GPU and transcribe an audio file.
import whisper
# Load the model on GPU
model = whisper.load_model("base", device="cuda")
# Transcribe audio file
result = model.transcribe("audio.mp3")
print("Transcription:", result["text"]) output
Transcription: This is the transcribed text from the audio file.
Common variations
- Use different model sizes like
tiny,small,medium, orlargefor speed vs accuracy trade-offs. - Run asynchronously with
asyncioif integrating into async apps. - Use
faster-whisperfor optimized GPU inference with lower latency.
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cuda")
segments, info = model.transcribe("audio.mp3")
for segment in segments:
print(segment.text) output
Printed transcribed segments from the audio file.
Troubleshooting
- If you get
CUDA out of memory, try a smaller model or reduce batch size. - Ensure PyTorch is installed with CUDA support by running
torch.cuda.is_available()which should returnTrue. - Update GPU drivers and CUDA toolkit if device is not detected.
import torch
print(torch.cuda.is_available()) # Should print True if GPU is ready output
True
Key Takeaways
- Use
device="cuda"when loading Whisper model to enable GPU acceleration. - Install PyTorch with CUDA support matching your GPU and CUDA version.
- Choose model size based on your speed and accuracy needs to optimize GPU memory usage.