How to Intermediate · 3 min read

How to generate AI voiceover for videos

Quick answer

Generate AI voiceover for videos by first creating a script with a text model like gpt-4o, then convert the text to speech using a TTS API such as ElevenLabs or Google Text-to-Speech. Combine the audio with your video using standard video editing tools or libraries.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 requests
API key for a TTS service like ElevenLabs or Google Cloud Text-to-Speech

Setup

Install required Python packages and set environment variables for API keys.

Install OpenAI and requests libraries: pip install openai requests
Export your API keys in your shell environment:
export OPENAI_API_KEY='your_openai_key'
export ELEVENLABS_API_KEY='your_elevenlabs_key'

bash

pip install openai requests

Step by step

This example generates a video voiceover script using gpt-4o and converts it to speech with ElevenLabs TTS API.

python

import os
from openai import OpenAI
import requests

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def generate_script(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content.strip()

# ElevenLabs TTS endpoint and headers
ELEVENLABS_API_KEY = os.environ["ELEVENLABS_API_KEY"]
ELEVENLABS_VOICE_ID = "21m00Tcm4TlvDq8ikWAM"  # Example voice ID
ELEVENLABS_URL = f"https://api.elevenlabs.io/v1/text-to-speech/{ELEVENLABS_VOICE_ID}"

headers = {
    "xi-api-key": ELEVENLABS_API_KEY,
    "Content-Type": "application/json"
}

def text_to_speech(text: str, filename: str):
    data = {
        "text": text,
        "voice_settings": {"stability": 0.75, "similarity_boost": 0.75}
    }
    response = requests.post(ELEVENLABS_URL, headers=headers, json=data)
    response.raise_for_status()
    with open(filename, "wb") as f:
        f.write(response.content)

if __name__ == "__main__":
    prompt = "Write a 60-second engaging voiceover script for a video about AI voice generation."
    script = generate_script(prompt)
    print("Generated script:\n", script)

    audio_file = "voiceover.mp3"
    text_to_speech(script, audio_file)
    print(f"Audio saved to {audio_file}")

output

Generated script:
Welcome to the future of AI voice generation! In this video, we'll explore how cutting-edge technology transforms text into natural, expressive speech, making content creation faster and more engaging. Stay tuned to discover the magic behind AI voices and how you can use them today.
Audio saved to voiceover.mp3

Common variations

You can use other TTS providers like Google Cloud Text-to-Speech or Amazon Polly by changing the API calls. For asynchronous or streaming audio generation, use the respective SDKs' async methods. To generate scripts with different tones or lengths, adjust the prompt or model parameters.

Troubleshooting

If you get authentication errors, verify your API keys are correctly set in environment variables.
If audio file is empty or corrupted, check the TTS API response status and quota limits.
For slow responses, consider reducing max_tokens or using a smaller model.

✅

Key Takeaways

Use gpt-4o to generate natural voiceover scripts tailored to your video content.
Convert text scripts to speech with APIs like ElevenLabs or Google TTS for high-quality audio output.
Always secure API keys in environment variables and handle API errors gracefully.
Adjust prompts and TTS voice parameters to match your desired tone and style.
Combine generated audio with video using standard editing tools or libraries like FFmpeg.

Verified 2026-04 · gpt-4o, ElevenLabs TTS, Google Cloud Text-to-Speech

Verify ↗