How to beginner · 3 min read

How to send audio to Gemini API

Quick answer

To send audio to the Gemini API, use the OpenAI Python SDK v1 and call the audio.transcriptions or audio.speech_to_text endpoint with your audio file in binary format. Set the model to gemini-1.5-flash or another Gemini audio-capable model and pass the audio data as a file stream in the request.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

Run pip install openai>=1.0 to install the SDK.
Export your API key in your shell: export OPENAI_API_KEY='your_api_key_here'.

bash

pip install openai>=1.0

Step by step

This example demonstrates sending a local audio file to Gemini's speech-to-text endpoint using the OpenAI Python SDK v1. It reads an audio file, sends it for transcription, and prints the text result.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Path to your audio file (wav, mp3, etc.)
audio_file_path = "path/to/audio.wav"

with open(audio_file_path, "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="gemini-1.5-flash",
        file=audio_file,
        response_format="text"
    )

print("Transcription:", response.text)

output

Transcription: Hello, this is a test audio transcription using Gemini API.

Common variations

Use client.audio.translations.create to translate audio to English.
Switch models to gemini-2.0-flash for improved accuracy.
Use async calls with asyncio and await for non-blocking audio processing.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def transcribe_async():
    async with open("path/to/audio.wav", "rb") as audio_file:
        response = await client.audio.transcriptions.acreate(
            model="gemini-2.0-flash",
            file=audio_file,
            response_format="text"
        )
        print("Async transcription:", response.text)

asyncio.run(transcribe_async())

output

Async transcription: This is an asynchronous transcription example with Gemini API.

Troubleshooting

If you get a FileNotFoundError, verify the audio file path is correct.
If the API returns an error about the model, confirm you are using a valid Gemini audio model like gemini-1.5-flash.
For authentication errors, ensure your OPENAI_API_KEY environment variable is set properly.

✅

Key Takeaways

Use the OpenAI Python SDK v1 with client.audio.transcriptions.create to send audio to Gemini API.
Pass audio files as binary streams and specify a Gemini audio-capable model like gemini-1.5-flash.
Async calls and translation endpoints provide flexible audio processing options.
Always set your API key in os.environ["OPENAI_API_KEY"] for secure authentication.
Check file paths and model names carefully to avoid common errors.

Verified 2026-04 · gemini-1.5-flash, gemini-2.0-flash

Verify ↗