How to beginner · 3 min read

How to send audio to Gemini API

Quick answer
To send audio to the Gemini API, use the OpenAI Python SDK v1 and call the audio.transcriptions or audio.speech_to_text endpoint with your audio file in binary format. Set the model to gemini-1.5-flash or another Gemini audio-capable model and pass the audio data as a file stream in the request.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

  • Run pip install openai>=1.0 to install the SDK.
  • Export your API key in your shell: export OPENAI_API_KEY='your_api_key_here'.
bash
pip install openai>=1.0

Step by step

This example demonstrates sending a local audio file to Gemini's speech-to-text endpoint using the OpenAI Python SDK v1. It reads an audio file, sends it for transcription, and prints the text result.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Path to your audio file (wav, mp3, etc.)
audio_file_path = "path/to/audio.wav"

with open(audio_file_path, "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="gemini-1.5-flash",
        file=audio_file,
        response_format="text"
    )

print("Transcription:", response.text)
output
Transcription: Hello, this is a test audio transcription using Gemini API.

Common variations

  • Use client.audio.translations.create to translate audio to English.
  • Switch models to gemini-2.0-flash for improved accuracy.
  • Use async calls with asyncio and await for non-blocking audio processing.
python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def transcribe_async():
    async with open("path/to/audio.wav", "rb") as audio_file:
        response = await client.audio.transcriptions.acreate(
            model="gemini-2.0-flash",
            file=audio_file,
            response_format="text"
        )
        print("Async transcription:", response.text)

asyncio.run(transcribe_async())
output
Async transcription: This is an asynchronous transcription example with Gemini API.

Troubleshooting

  • If you get a FileNotFoundError, verify the audio file path is correct.
  • If the API returns an error about the model, confirm you are using a valid Gemini audio model like gemini-1.5-flash.
  • For authentication errors, ensure your OPENAI_API_KEY environment variable is set properly.

Key Takeaways

  • Use the OpenAI Python SDK v1 with client.audio.transcriptions.create to send audio to Gemini API.
  • Pass audio files as binary streams and specify a Gemini audio-capable model like gemini-1.5-flash.
  • Async calls and translation endpoints provide flexible audio processing options.
  • Always set your API key in os.environ["OPENAI_API_KEY"] for secure authentication.
  • Check file paths and model names carefully to avoid common errors.
Verified 2026-04 · gemini-1.5-flash, gemini-2.0-flash
Verify ↗