How to generate AI voiceover for videos
Quick answer
Generate AI voiceover for videos by first creating a script with a text model like
gpt-4o, then convert the text to speech using a TTS API such as ElevenLabs or Google Text-to-Speech. Combine the audio with your video using standard video editing tools or libraries.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 requestsAPI key for a TTS service like ElevenLabs or Google Cloud Text-to-Speech
Setup
Install required Python packages and set environment variables for API keys.
- Install OpenAI and requests libraries:
pip install openai requests - Export your API keys in your shell environment:
export OPENAI_API_KEY='your_openai_key'export ELEVENLABS_API_KEY='your_elevenlabs_key'
pip install openai requests Step by step
This example generates a video voiceover script using gpt-4o and converts it to speech with ElevenLabs TTS API.
import os
from openai import OpenAI
import requests
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def generate_script(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content.strip()
# ElevenLabs TTS endpoint and headers
ELEVENLABS_API_KEY = os.environ["ELEVENLABS_API_KEY"]
ELEVENLABS_VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Example voice ID
ELEVENLABS_URL = f"https://api.elevenlabs.io/v1/text-to-speech/{ELEVENLABS_VOICE_ID}"
headers = {
"xi-api-key": ELEVENLABS_API_KEY,
"Content-Type": "application/json"
}
def text_to_speech(text: str, filename: str):
data = {
"text": text,
"voice_settings": {"stability": 0.75, "similarity_boost": 0.75}
}
response = requests.post(ELEVENLABS_URL, headers=headers, json=data)
response.raise_for_status()
with open(filename, "wb") as f:
f.write(response.content)
if __name__ == "__main__":
prompt = "Write a 60-second engaging voiceover script for a video about AI voice generation."
script = generate_script(prompt)
print("Generated script:\n", script)
audio_file = "voiceover.mp3"
text_to_speech(script, audio_file)
print(f"Audio saved to {audio_file}") output
Generated script: Welcome to the future of AI voice generation! In this video, we'll explore how cutting-edge technology transforms text into natural, expressive speech, making content creation faster and more engaging. Stay tuned to discover the magic behind AI voices and how you can use them today. Audio saved to voiceover.mp3
Common variations
You can use other TTS providers like Google Cloud Text-to-Speech or Amazon Polly by changing the API calls. For asynchronous or streaming audio generation, use the respective SDKs' async methods. To generate scripts with different tones or lengths, adjust the prompt or model parameters.
Troubleshooting
- If you get authentication errors, verify your API keys are correctly set in environment variables.
- If audio file is empty or corrupted, check the TTS API response status and quota limits.
- For slow responses, consider reducing
max_tokensor using a smaller model.
Key Takeaways
- Use
gpt-4oto generate natural voiceover scripts tailored to your video content. - Convert text scripts to speech with APIs like ElevenLabs or Google TTS for high-quality audio output.
- Always secure API keys in environment variables and handle API errors gracefully.
- Adjust prompts and TTS voice parameters to match your desired tone and style.
- Combine generated audio with video using standard editing tools or libraries like FFmpeg.