How to send audio to Gemini API
Quick answer
To send audio to the Gemini API, use the OpenAI Python SDK v1 and call the audio.transcriptions or audio.speech_to_text endpoint with your audio file in binary format. Set the model to
gemini-1.5-flash or another Gemini audio-capable model and pass the audio data as a file stream in the request.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable.
- Run
pip install openai>=1.0to install the SDK. - Export your API key in your shell:
export OPENAI_API_KEY='your_api_key_here'.
pip install openai>=1.0 Step by step
This example demonstrates sending a local audio file to Gemini's speech-to-text endpoint using the OpenAI Python SDK v1. It reads an audio file, sends it for transcription, and prints the text result.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Path to your audio file (wav, mp3, etc.)
audio_file_path = "path/to/audio.wav"
with open(audio_file_path, "rb") as audio_file:
response = client.audio.transcriptions.create(
model="gemini-1.5-flash",
file=audio_file,
response_format="text"
)
print("Transcription:", response.text) output
Transcription: Hello, this is a test audio transcription using Gemini API.
Common variations
- Use
client.audio.translations.createto translate audio to English. - Switch models to
gemini-2.0-flashfor improved accuracy. - Use async calls with
asyncioandawaitfor non-blocking audio processing.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def transcribe_async():
async with open("path/to/audio.wav", "rb") as audio_file:
response = await client.audio.transcriptions.acreate(
model="gemini-2.0-flash",
file=audio_file,
response_format="text"
)
print("Async transcription:", response.text)
asyncio.run(transcribe_async()) output
Async transcription: This is an asynchronous transcription example with Gemini API.
Troubleshooting
- If you get a
FileNotFoundError, verify the audio file path is correct. - If the API returns an error about the model, confirm you are using a valid Gemini audio model like
gemini-1.5-flash. - For authentication errors, ensure your
OPENAI_API_KEYenvironment variable is set properly.
Key Takeaways
- Use the OpenAI Python SDK v1 with
client.audio.transcriptions.createto send audio to Gemini API. - Pass audio files as binary streams and specify a Gemini audio-capable model like
gemini-1.5-flash. - Async calls and translation endpoints provide flexible audio processing options.
- Always set your API key in
os.environ["OPENAI_API_KEY"]for secure authentication. - Check file paths and model names carefully to avoid common errors.