How to beginner · 3 min read

How to transcribe video files with Whisper

Quick answer

Use the OpenAI Whisper API to transcribe video files by extracting the audio track and sending it to client.audio.transcriptions.create with model whisper-1. The API supports common video formats like mp4 and returns accurate text transcription.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable.

Run pip install openai to install the SDK.
Set your API key in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).

bash

pip install openai

Step by step

Extract the audio from your video file (e.g., video.mp4) and send it to the Whisper API for transcription. The OpenAI API accepts video files directly, so you can open the video file in binary mode and pass it to the transcription endpoint.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

video_path = "video.mp4"

with open(video_path, "rb") as video_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=video_file
    )

print("Transcription:", transcript.text)

output

Transcription: This is the transcribed text from the video audio.

Common variations

Async usage: Use asynchronous file I/O and async client calls for non-blocking transcription.
Different models: Currently, whisper-1 is the standard model for transcription.
Local transcription: Use the openai-whisper Python package for offline transcription without API calls.

python

import asyncio
import os
from openai import OpenAI

async def transcribe_video_async(video_path: str):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    async with await asyncio.to_thread(open, video_path, "rb") as video_file:
        transcript = await client.audio.transcriptions.acreate(
            model="whisper-1",
            file=video_file
        )
    print("Async transcription:", transcript.text)

# asyncio.run(transcribe_video_async("video.mp4"))

Troubleshooting

If you get an error about unsupported file format, ensure your video file is in a supported format like mp4, mov, or mkv.
If transcription is inaccurate, try extracting audio separately with tools like ffmpeg and send the audio file instead.
Check your API key and environment variable if you get authentication errors.

✅

Key Takeaways

Use OpenAI's whisper-1 model via client.audio.transcriptions.create to transcribe video files directly.
Open video files in binary mode and pass them to the API; no need to extract audio manually unless accuracy issues arise.
For asynchronous workflows, use the async client method acreate with async file handling.
Ensure your video format is supported and your API key is correctly set in environment variables.
Local transcription is possible with the openai-whisper package if you prefer offline processing.

Verified 2026-04 · whisper-1

Verify ↗