RuntimeError: FFmpeg required for forced alignment
whisperx.alignment.RuntimeError (FFmpeg dependency missing)
Stack trace
RuntimeError: FFmpeg is required for forced alignment. Please install FFmpeg and make sure it is in your system PATH.
File "/path/to/site-packages/whisperx/alignment.py", line 47, in align
raise RuntimeError('FFmpeg is required for forced alignment. Please install FFmpeg and make sure it is in your system PATH.')
File "your_script.py", line 28, in transcribe_with_diarization
result = alignment_model.align(segments, language, audio, model) # Forced alignment triggered here
RuntimeError: FFmpeg is required for forced alignment. Please install FFmpeg and make sure it is in your system PATH. Why it happens
WhisperX uses FFmpeg internally to resample audio and align transcribed segments with precise timestamps during diarization. When you call `align()` or enable speaker diarization, WhisperX spawns an FFmpeg subprocess. If FFmpeg is not installed on your system or not in the executable PATH, the subprocess call fails with this error. This is a hard dependency: there's no fallback when forced alignment is requested.
Detection
Test FFmpeg availability before running WhisperX by executing `which ffmpeg` (Linux/Mac) or `where ffmpeg` (Windows) in your terminal. In Python, use `shutil.which('ffmpeg')` to detect it before calling `.align()`. Add logging to catch this early rather than deep in a batch transcription job.
Causes & fixes
FFmpeg is not installed on the system
Install FFmpeg: Ubuntu/Debian: `sudo apt-get install ffmpeg`; macOS: `brew install ffmpeg`; Windows: download from ffmpeg.org and add to PATH, or use `choco install ffmpeg` if using Chocolatey
FFmpeg is installed but not in the system PATH
Add the FFmpeg binary directory to your system PATH environment variable, then restart your Python interpreter or IDE. Verify with `ffmpeg -version` in terminal
Calling `.align()` with forced_alignment=True when transcription_model doesn't support it
Set `align_model=None` in the WhisperX config, or use a smaller alignment model variant that doesn't require FFmpeg, or skip diarization entirely by removing the `diarize_model` parameter
Running in a containerized/serverless environment where FFmpeg isn't in the Docker image
Add `RUN apt-get install -y ffmpeg` to your Dockerfile, or use a pre-built image with FFmpeg already installed (check whisperx-community Docker hub images)
Code: broken vs fixed
import whisperx
import os
audio_file = 'meeting.mp3'
model = whisperx.load_model('large-v2', device='cuda', language='en')
result = model.transcribe(audio_file)
# Load diarization model — this triggers FFmpeg dependency
diarize_model = whisperx.DiarizationPipeline(use_auth_token=os.environ.get('HF_TOKEN'))
# FFmpeg required here — will crash if FFmpeg not installed
align_model, metadata = whisperx.load_align_model(
language_code=result['language'],
device='cuda'
)
aligned_result = whisperx.align(
result['segments'],
align_model,
metadata,
audio_file,
device='cuda',
return_char_alignments=False
) # ← RuntimeError: FFmpeg required for forced alignment import whisperx
import os
import shutil
# FIX 1: Check FFmpeg is available before running
if shutil.which('ffmpeg') is None:
print('ERROR: FFmpeg not found in PATH. Install it first.')
print('Ubuntu/Debian: sudo apt-get install ffmpeg')
print('macOS: brew install ffmpeg')
print('Windows: choco install ffmpeg (or download from ffmpeg.org)')
exit(1)
audio_file = 'meeting.mp3'
model = whisperx.load_model('large-v2', device='cuda', language='en')
result = model.transcribe(audio_file)
# FIX 2: Load diarization model
diarize_model = whisperx.DiarizationPipeline(use_auth_token=os.environ.get('HF_TOKEN'))
# FIX 3: Load alignment model with FFmpeg now guaranteed
align_model, metadata = whisperx.load_align_model(
language_code=result['language'],
device='cuda'
)
# FIX 4: Alignment now works because FFmpeg is available
aligned_result = whisperx.align(
result['segments'],
align_model,
metadata,
audio_file,
device='cuda',
return_char_alignments=False
)
print('Transcription with alignment completed successfully')
print(f'Segments: {len(aligned_result["segments"])}') Workaround
If you cannot install FFmpeg on your system immediately, skip forced alignment by setting `align_model=None` or removing the diarization pipeline entirely. Transcribe without speaker diarization using just `whisperx.load_model()` and `.transcribe()`, then perform speaker diarization separately using a library that doesn't depend on FFmpeg (like `pyannote.audio`), though this loses temporal alignment precision.
Prevention
1. Always run FFmpeg dependency checks in CI/CD before deploying WhisperX. 2. Pin FFmpeg version in your Dockerfile or environment.yml (`ffmpeg=4.4.2` or later). 3. Use OpenAI's hosted Whisper API instead (`client.audio.transcriptions.create()`) if you don't own the infrastructure: it handles FFmpeg internally. 4. Test transcription with diarization on your target deployment environment (local, Docker, Lambda) before moving to production.