How to beginner · 3 min read

How to transcribe video files with Whisper

Quick answer
Use the OpenAI Whisper API to transcribe video files by extracting the audio track and sending it to client.audio.transcriptions.create with model whisper-1. The API supports common video formats like mp4 and returns accurate text transcription.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable.

  • Run pip install openai to install the SDK.
  • Set your API key in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).
bash
pip install openai

Step by step

Extract the audio from your video file (e.g., video.mp4) and send it to the Whisper API for transcription. The OpenAI API accepts video files directly, so you can open the video file in binary mode and pass it to the transcription endpoint.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

video_path = "video.mp4"

with open(video_path, "rb") as video_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=video_file
    )

print("Transcription:", transcript.text)
output
Transcription: This is the transcribed text from the video audio.

Common variations

  • Async usage: Use asynchronous file I/O and async client calls for non-blocking transcription.
  • Different models: Currently, whisper-1 is the standard model for transcription.
  • Local transcription: Use the openai-whisper Python package for offline transcription without API calls.
python
import asyncio
import os
from openai import OpenAI

async def transcribe_video_async(video_path: str):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    async with await asyncio.to_thread(open, video_path, "rb") as video_file:
        transcript = await client.audio.transcriptions.acreate(
            model="whisper-1",
            file=video_file
        )
    print("Async transcription:", transcript.text)

# asyncio.run(transcribe_video_async("video.mp4"))

Troubleshooting

  • If you get an error about unsupported file format, ensure your video file is in a supported format like mp4, mov, or mkv.
  • If transcription is inaccurate, try extracting audio separately with tools like ffmpeg and send the audio file instead.
  • Check your API key and environment variable if you get authentication errors.

Key Takeaways

  • Use OpenAI's whisper-1 model via client.audio.transcriptions.create to transcribe video files directly.
  • Open video files in binary mode and pass them to the API; no need to extract audio manually unless accuracy issues arise.
  • For asynchronous workflows, use the async client method acreate with async file handling.
  • Ensure your video format is supported and your API key is correctly set in environment variables.
  • Local transcription is possible with the openai-whisper package if you prefer offline processing.
Verified 2026-04 · whisper-1
Verify ↗