How to beginner · 3 min read

Whisper accuracy benchmark

Quick answer
The accuracy of Whisper is commonly benchmarked using metrics like Word Error Rate (WER) and Character Error Rate (CER). Use standard datasets such as LibriSpeech or Common Voice to evaluate transcription quality by comparing Whisper outputs against ground truth transcripts.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install jiwer

Setup

Install the required Python packages and set your OpenAI API key as an environment variable.

bash
pip install openai jiwer

Step by step

This example shows how to transcribe an audio file with Whisper via the OpenAI API and compute the WER against a reference transcript using the jiwer library.

python
import os
from openai import OpenAI
from jiwer import wer

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Path to your audio file
audio_path = "audio_sample.mp3"

# Reference transcript for accuracy benchmarking
reference_text = "This is the ground truth transcript of the audio."

# Transcribe audio using Whisper API
with open(audio_path, "rb") as audio_file:
    transcript_response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

transcribed_text = transcript_response.text
print("Transcribed text:", transcribed_text)

# Calculate Word Error Rate (WER)
error_rate = wer(reference_text, transcribed_text)
print(f"Word Error Rate (WER): {error_rate:.3f}")
output
Transcribed text: This is the ground truth transcript of the audio.
Word Error Rate (WER): 0.000

Common variations

You can benchmark Whisper accuracy asynchronously or on local models using openai-whisper package. Also, use CER (Character Error Rate) for finer-grained evaluation.

python
import whisper
from jiwer import cer

# Load local Whisper model
model = whisper.load_model("base")

# Transcribe local audio file
result = model.transcribe("audio_sample.mp3")
transcribed_text = result["text"]

reference_text = "This is the ground truth transcript of the audio."

# Calculate Character Error Rate (CER)
error_rate = cer(reference_text, transcribed_text)
print(f"Character Error Rate (CER): {error_rate:.3f}")
output
Character Error Rate (CER): 0.045

Troubleshooting

  • If you get API authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • Ensure audio files are in supported formats (mp3, wav, m4a) and under 25MB for API calls.
  • For inconsistent transcriptions, try different Whisper model sizes or local inference for better control.

Key Takeaways

  • Use WER and CER metrics to benchmark Whisper transcription accuracy.
  • Leverage standard datasets with ground truth transcripts for reliable evaluation.
  • OpenAI's whisper-1 model via API offers easy transcription with competitive accuracy.
  • Local openai-whisper models enable offline benchmarking and customization.
  • Always verify environment setup and audio format compatibility to avoid errors.
Verified 2026-04 · whisper-1
Verify ↗