Whisper batch transcription
Quick answer
Use the OpenAI Whisper API via the
openai Python SDK to transcribe multiple audio files in batch by iterating over files and calling client.audio.transcriptions.create for each. Automate batch processing with a simple loop and handle results programmatically.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official openai Python package and set your OpenAI API key as an environment variable.
- Install package:
pip install openai - Set API key in your shell:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai Step by step
This example demonstrates batch transcription of multiple audio files using the OpenAI Whisper API. It loads each audio file, sends it to the whisper-1 model, and prints the transcribed text.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def batch_transcribe(file_paths):
transcripts = {}
for path in file_paths:
with open(path, "rb") as audio_file:
response = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
transcripts[path] = response.text
return transcripts
if __name__ == "__main__":
audio_files = ["audio1.mp3", "audio2.wav", "audio3.m4a"]
results = batch_transcribe(audio_files)
for file, text in results.items():
print(f"Transcription for {file}:\n{text}\n") output
Transcription for audio1.mp3: Hello, this is the first audio file. Transcription for audio2.wav: This is the second audio file transcription. Transcription for audio3.m4a: Final audio file transcription text here.
Common variations
You can adapt batch transcription for asynchronous processing using asyncio to speed up multiple requests. Also, you can specify additional parameters like language or prompt to improve accuracy. Using different audio formats (mp3, wav, m4a) is supported.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def transcribe_async(path):
with open(path, "rb") as audio_file:
response = await client.audio.transcriptions.acreate(
model="whisper-1",
file=audio_file
)
return path, response.text
async def batch_transcribe_async(file_paths):
tasks = [transcribe_async(path) for path in file_paths]
results = await asyncio.gather(*tasks)
return dict(results)
if __name__ == "__main__":
audio_files = ["audio1.mp3", "audio2.wav", "audio3.m4a"]
results = asyncio.run(batch_transcribe_async(audio_files))
for file, text in results.items():
print(f"Async transcription for {file}:\n{text}\n") output
Async transcription for audio1.mp3: Hello, this is the first audio file. Async transcription for audio2.wav: This is the second audio file transcription. Async transcription for audio3.m4a: Final audio file transcription text here.
Troubleshooting
- If you see
FileNotFoundError: Verify the audio file paths are correct and accessible. - If you get
InvalidRequestError: Check that the audio file format is supported and the file is not corrupted. - If transcription is inaccurate: Specify the
languageparameter if known, or provide apromptto guide the model. - API rate limits: For large batches, add delays or use asynchronous calls to avoid hitting rate limits.
Key Takeaways
- Use the OpenAI Whisper API with
client.audio.transcriptions.createto transcribe audio files one by one in batch. - Automate batch transcription by looping over audio files and collecting results in a dictionary.
- For faster batch processing, use asynchronous calls with
acreateandasyncio. - Specify optional parameters like
languageto improve transcription accuracy. - Handle common errors by verifying file paths, formats, and managing API rate limits.