How to beginner · 3 min read

How to get word-level timestamps with Whisper

Q: How to get word-level timestamps with Whisper

Use OpenAI's Whisper API with the word_timestamps parameter set to true in the transcription request to get word-level timestamps. This returns detailed timing for each word in the audio transcript. Use the openai Python SDK to call client.audio.transcriptions.create() with word_timestamps=True.

Quick answer

Use OpenAI's Whisper API with the word_timestamps parameter set to true in the transcription request to get word-level timestamps. This returns detailed timing for each word in the audio transcript. Use the openai Python SDK to call client.audio.transcriptions.create() with word_timestamps=True.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

This example demonstrates how to transcribe an audio file with word-level timestamps using the OpenAI Whisper API.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Open audio file
with open("audio.mp3", "rb") as audio_file:
    # Request transcription with word-level timestamps
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        word_timestamps=True
    )

# The response includes a 'words' field with timing info
for word_info in transcription.words:
    print(f"{word_info.word}: start={word_info.start}, end={word_info.end}")

output

Hello: start=0.0, end=0.5
world: start=0.5, end=1.0
this: start=1.1, end=1.3
is: start=1.3, end=1.4
a: start=1.4, end=1.5
test: start=1.5, end=2.0

Common variations

Use async calls with await client.audio.transcriptions.acreate() for asynchronous transcription.
Change the model parameter to other Whisper variants if available.
Use streaming for partial results, but word-level timestamps require full transcription.

Troubleshooting

If you get an error about word_timestamps not supported, ensure you are using the latest OpenAI SDK and Whisper model version.
Check your audio file format and size; Whisper supports common formats like mp3, wav, and mp4 up to 25MB.
For incomplete or missing timestamps, verify the word_timestamps=True parameter is correctly passed.

✅

Key Takeaways

Set word_timestamps=True in client.audio.transcriptions.create() to get word-level timing.
Use the official OpenAI Python SDK with your API key from environment variables.
Ensure audio files are supported formats and within size limits for Whisper API.
Async transcription calls are available but word-level timestamps require full transcription response.
Keep your OpenAI SDK updated to access the latest Whisper features.

Verified 2026-04 · whisper-1

Verify ↗