How to beginner · 3 min read

How to translate entire documents with AI

Quick answer
Use the OpenAI Python SDK to read your document, split it into manageable chunks, and send each chunk to a translation model like gpt-4o for translation. Combine the translated chunks to reconstruct the full document efficiently.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example reads a text document, splits it into chunks, translates each chunk using gpt-4o, and combines the results.

python
import os
from openai import OpenAI

# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to split text into chunks

def chunk_text(text, max_tokens=1000):
    words = text.split()
    chunks = []
    current_chunk = []
    current_len = 0
    for word in words:
        current_len += 1
        current_chunk.append(word)
        if current_len >= max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = []
            current_len = 0
    if current_chunk:
        chunks.append(" ".join(current_chunk))
    return chunks

# Load your document
with open("document.txt", "r", encoding="utf-8") as f:
    text = f.read()

chunks = chunk_text(text, max_tokens=500)  # Adjust chunk size as needed

translated_chunks = []
for i, chunk in enumerate(chunks):
    prompt = f"Translate the following text to French:\n\n{chunk}"
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    translated_text = response.choices[0].message.content
    translated_chunks.append(translated_text)
    print(f"Chunk {i+1}/{len(chunks)} translated.")

# Combine translated chunks
full_translation = "\n\n".join(translated_chunks)

# Save to file
with open("translated_document.txt", "w", encoding="utf-8") as f:
    f.write(full_translation)

print("Translation complete. Saved to translated_document.txt")
output
Chunk 1/3 translated.
Chunk 2/3 translated.
Chunk 3/3 translated.
Translation complete. Saved to translated_document.txt

Common variations

  • Use asynchronous calls with asyncio and client.chat.completions.create for faster batch translation.
  • Switch to other models like claude-3-5-sonnet-20241022 or gemini-2.5-pro for different translation styles or languages.
  • Implement streaming to process large documents chunk-by-chunk with partial outputs.
python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def translate_chunk(chunk):
    prompt = f"Translate the following text to Spanish:\n\n{chunk}"
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    with open("document.txt", "r", encoding="utf-8") as f:
        text = f.read()

    chunks = text.split("\n\n")  # Simple split by paragraphs

    tasks = [translate_chunk(chunk) for chunk in chunks]
    translated_chunks = await asyncio.gather(*tasks)

    full_translation = "\n\n".join(translated_chunks)

    with open("translated_document_async.txt", "w", encoding="utf-8") as f:
        f.write(full_translation)

    print("Async translation complete.")

if __name__ == "__main__":
    asyncio.run(main())
output
Async translation complete.

Troubleshooting

  • If you hit token limits, reduce chunk size or use models with larger context windows.
  • For rate limit errors, add retry logic with exponential backoff.
  • Ensure your document encoding is UTF-8 to avoid decoding errors.
  • If translations are inaccurate, try adding more detailed instructions in the prompt.

Key Takeaways

  • Split large documents into smaller chunks to avoid token limits during translation.
  • Use the OpenAI Python SDK with gpt-4o or other advanced models for high-quality translations.
  • Async calls can speed up batch document translation significantly.
  • Adjust prompts to specify target language and style for better translation accuracy.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, gemini-2.5-pro
Verify ↗