How to beginner · 3 min read

How to translate entire documents with AI

Q: How to translate entire documents with AI

Use the OpenAI Python SDK to read your document, split it into manageable chunks, and send each chunk to a translation model like gpt-4o for translation. Combine the translated chunks to reconstruct the full document efficiently.

Quick answer

Use the OpenAI Python SDK to read your document, split it into manageable chunks, and send each chunk to a translation model like gpt-4o for translation. Combine the translated chunks to reconstruct the full document efficiently.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example reads a text document, splits it into chunks, translates each chunk using gpt-4o, and combines the results.

python

import os
from openai import OpenAI

# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to split text into chunks

def chunk_text(text, max_tokens=1000):
    words = text.split()
    chunks = []
    current_chunk = []
    current_len = 0
    for word in words:
        current_len += 1
        current_chunk.append(word)
        if current_len >= max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = []
            current_len = 0
    if current_chunk:
        chunks.append(" ".join(current_chunk))
    return chunks

# Load your document
with open("document.txt", "r", encoding="utf-8") as f:
    text = f.read()

chunks = chunk_text(text, max_tokens=500)  # Adjust chunk size as needed

translated_chunks = []
for i, chunk in enumerate(chunks):
    prompt = f"Translate the following text to French:\n\n{chunk}"
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    translated_text = response.choices[0].message.content
    translated_chunks.append(translated_text)
    print(f"Chunk {i+1}/{len(chunks)} translated.")

# Combine translated chunks
full_translation = "\n\n".join(translated_chunks)

# Save to file
with open("translated_document.txt", "w", encoding="utf-8") as f:
    f.write(full_translation)

print("Translation complete. Saved to translated_document.txt")

output

Chunk 1/3 translated.
Chunk 2/3 translated.
Chunk 3/3 translated.
Translation complete. Saved to translated_document.txt

Common variations

Use asynchronous calls with asyncio and client.chat.completions.create for faster batch translation.
Switch to other models like claude-3-5-sonnet-20241022 or gemini-2.5-pro for different translation styles or languages.
Implement streaming to process large documents chunk-by-chunk with partial outputs.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def translate_chunk(chunk):
    prompt = f"Translate the following text to Spanish:\n\n{chunk}"
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    with open("document.txt", "r", encoding="utf-8") as f:
        text = f.read()

    chunks = text.split("\n\n")  # Simple split by paragraphs

    tasks = [translate_chunk(chunk) for chunk in chunks]
    translated_chunks = await asyncio.gather(*tasks)

    full_translation = "\n\n".join(translated_chunks)

    with open("translated_document_async.txt", "w", encoding="utf-8") as f:
        f.write(full_translation)

    print("Async translation complete.")

if __name__ == "__main__":
    asyncio.run(main())

output

Async translation complete.

Troubleshooting

If you hit token limits, reduce chunk size or use models with larger context windows.
For rate limit errors, add retry logic with exponential backoff.
Ensure your document encoding is UTF-8 to avoid decoding errors.
If translations are inaccurate, try adding more detailed instructions in the prompt.

✅

Key Takeaways

Split large documents into smaller chunks to avoid token limits during translation.
Use the OpenAI Python SDK with gpt-4o or other advanced models for high-quality translations.
Async calls can speed up batch document translation significantly.
Adjust prompts to specify target language and style for better translation accuracy.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, gemini-2.5-pro

Verify ↗