How to beginner · 3 min read

How to get word-level timestamps with Whisper

Quick answer
Use OpenAI's Whisper API with the word_timestamps parameter set to true in the transcription request to get word-level timestamps. This returns detailed timing for each word in the audio transcript. Use the openai Python SDK to call client.audio.transcriptions.create() with word_timestamps=True.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash
pip install openai>=1.0

Step by step

This example demonstrates how to transcribe an audio file with word-level timestamps using the OpenAI Whisper API.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Open audio file
with open("audio.mp3", "rb") as audio_file:
    # Request transcription with word-level timestamps
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        word_timestamps=True
    )

# The response includes a 'words' field with timing info
for word_info in transcription.words:
    print(f"{word_info.word}: start={word_info.start}, end={word_info.end}")
output
Hello: start=0.0, end=0.5
world: start=0.5, end=1.0
this: start=1.1, end=1.3
is: start=1.3, end=1.4
a: start=1.4, end=1.5
test: start=1.5, end=2.0

Common variations

  • Use async calls with await client.audio.transcriptions.acreate() for asynchronous transcription.
  • Change the model parameter to other Whisper variants if available.
  • Use streaming for partial results, but word-level timestamps require full transcription.

Troubleshooting

  • If you get an error about word_timestamps not supported, ensure you are using the latest OpenAI SDK and Whisper model version.
  • Check your audio file format and size; Whisper supports common formats like mp3, wav, and mp4 up to 25MB.
  • For incomplete or missing timestamps, verify the word_timestamps=True parameter is correctly passed.

Key Takeaways

  • Set word_timestamps=True in client.audio.transcriptions.create() to get word-level timing.
  • Use the official OpenAI Python SDK with your API key from environment variables.
  • Ensure audio files are supported formats and within size limits for Whisper API.
  • Async transcription calls are available but word-level timestamps require full transcription response.
  • Keep your OpenAI SDK updated to access the latest Whisper features.
Verified 2026-04 · whisper-1
Verify ↗