How to transcribe multiple audio files
Quick answer
Use the OpenAI Whisper API by iterating over your audio files and calling
client.audio.transcriptions.create for each file. Automate this with Python by opening each audio file in a loop and collecting the transcriptions.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your OpenAI API key as an environment variable.
- Install package:
pip install openai>=1.0 - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai>=1.0 Step by step
This example demonstrates how to transcribe multiple audio files sequentially using the OpenAI Whisper API. It opens each file, sends it to the API, and prints the transcription.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def transcribe_files(file_paths):
transcriptions = {}
for path in file_paths:
with open(path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
transcriptions[path] = transcript.text
return transcriptions
if __name__ == "__main__":
audio_files = ["audio1.mp3", "audio2.wav", "audio3.m4a"]
results = transcribe_files(audio_files)
for file, text in results.items():
print(f"Transcription for {file}:\n{text}\n") output
Transcription for audio1.mp3: Hello, this is the first audio file. Transcription for audio2.wav: This is the second audio file transcription. Transcription for audio3.m4a: Final audio file transcription text here.
Common variations
You can adapt the transcription process by:
- Using asynchronous calls with
asynciofor parallel processing. - Streaming partial transcriptions if supported by the API.
- Specifying different Whisper models if available.
- Handling different audio formats (mp3, wav, m4a, etc.) supported by Whisper.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def transcribe_file_async(path):
with open(path, "rb") as audio_file:
transcript = await client.audio.transcriptions.acreate(
model="whisper-1",
file=audio_file
)
return path, transcript.text
async def transcribe_files_async(file_paths):
tasks = [transcribe_file_async(path) for path in file_paths]
results = await asyncio.gather(*tasks)
return dict(results)
if __name__ == "__main__":
audio_files = ["audio1.mp3", "audio2.wav", "audio3.m4a"]
results = asyncio.run(transcribe_files_async(audio_files))
for file, text in results.items():
print(f"Async transcription for {file}:\n{text}\n") output
Async transcription for audio1.mp3: Hello, this is the first audio file. Async transcription for audio2.wav: This is the second audio file transcription. Async transcription for audio3.m4a: Final audio file transcription text here.
Troubleshooting
- If you get
InvalidRequestError, verify your API key and file format. - For
RateLimitError, add delays or batch your requests. - If transcription is inaccurate, check audio quality and try different Whisper models.
- Ensure audio files are under 25MB for API upload limits.
Key Takeaways
- Use a loop to send each audio file to
client.audio.transcriptions.createfor batch transcription. - Async calls with
acreateenable faster parallel transcription of multiple files. - Ensure audio files are supported formats and under 25MB for Whisper API.
- Handle API rate limits by batching or adding delays between requests.
- Set your OpenAI API key securely via environment variables for authentication.