How to build subtitle generator with Whisper
Quick answer
Use OpenAI's
Whisper API to transcribe audio files into text with timestamps, then format the output into subtitle files like SRT. The process involves uploading audio, calling client.audio.transcriptions.create with model="whisper-1", and parsing the response for subtitle generation.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable.
pip install openai>=1.0 Step by step
This example shows how to transcribe an audio file using the Whisper API and generate a simple subtitle file in SRT format.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def transcribe_audio_to_srt(audio_path: str, srt_path: str):
with open(audio_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json"
)
# Extract segments with timestamps
segments = transcript.get("segments", [])
# Write SRT file
with open(srt_path, "w", encoding="utf-8") as srt_file:
for i, segment in enumerate(segments, start=1):
start = segment["start"]
end = segment["end"]
text = segment["text"].strip()
def format_timestamp(seconds):
h = int(seconds // 3600)
m = int((seconds % 3600) // 60)
s = int(seconds % 60)
ms = int((seconds - int(seconds)) * 1000)
return f"{h:02}:{m:02}:{s:02},{ms:03}"
srt_file.write(f"{i}\n")
srt_file.write(f"{format_timestamp(start)} --> {format_timestamp(end)}\n")
srt_file.write(f"{text}\n\n")
print(f"Subtitle file saved to {srt_path}")
if __name__ == "__main__":
audio_file_path = "audio.mp3" # Replace with your audio file path
subtitle_file_path = "output.srt"
transcribe_audio_to_srt(audio_file_path, subtitle_file_path) output
Subtitle file saved to output.srt
Common variations
You can use asynchronous calls with asyncio and the OpenAI SDK, or change the response_format to text for plain transcription without timestamps. For local transcription, use the openai-whisper package. Streaming is not supported for Whisper transcription.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def async_transcribe(audio_path: str):
with open(audio_path, "rb") as audio_file:
transcript = await client.audio.transcriptions.acreate(
model="whisper-1",
file=audio_file,
response_format="verbose_json"
)
print(transcript)
if __name__ == "__main__":
asyncio.run(async_transcribe("audio.mp3")) output
{...JSON transcription with segments...} Troubleshooting
- If you get a
FileNotFoundError, verify the audio file path is correct. - If transcription is empty, check the audio format and ensure it is supported (mp3, wav, m4a, etc.).
- For API authentication errors, confirm your
OPENAI_API_KEYenvironment variable is set correctly. - If timestamps are missing, use
response_format="verbose_json"to get detailed segments.
Key Takeaways
- Use OpenAI's Whisper API with
model="whisper-1"for accurate audio transcription with timestamps. - Format the detailed JSON segments into standard subtitle formats like SRT for subtitle generation.
- Set
response_format="verbose_json"to get timestamps needed for subtitles. - Ensure your audio file path and format are correct to avoid errors.
- Async transcription is supported via
acreatemethod in the OpenAI Python SDK.