OpenAIError
openai.OpenAIError (file upload size limit exceeded)
Stack trace
openai.OpenAIError: File upload failed: The file size exceeds the maximum allowed limit of 100MB for fine-tuning uploads.
Why it happens
OpenAI's fine-tuning API enforces a strict maximum file size limit (typically 100MB) for training data uploads. When the uploaded file exceeds this limit, the API rejects the request with an error. This prevents excessive resource usage and ensures stable service.
Detection
Check the file size before uploading by inspecting the file's byte size in your code and log or raise an error if it exceeds the documented limit to avoid API rejection.
Causes & fixes
The training data JSONL file exceeds OpenAI's 100MB upload size limit.
Split the training data into multiple smaller JSONL files each under 100MB and upload them separately or reduce the dataset size.
Incorrect file format or extra metadata inflating file size beyond the limit.
Ensure the file is a clean JSONL with no extraneous data or formatting, and compress or clean the data if possible before upload.
Attempting to upload a file with embedded images or binary data increasing size.
Remove any non-text content from the fine-tuning file; only plain JSONL text data is supported.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI()
# This will fail if file is too large
response = client.files.upload(file=open('large_training_data.jsonl', 'rb'), purpose='fine-tune') # triggers size limit error import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Check file size before upload
file_path = 'large_training_data.jsonl'
file_size = os.path.getsize(file_path)
max_size = 100 * 1024 * 1024 # 100MB
if file_size > max_size:
raise ValueError(f'File size {file_size} exceeds 100MB limit for fine-tuning uploads.')
response = client.files.upload(file=open(file_path, 'rb'), purpose='fine-tune') # fixed by size check
print('File uploaded successfully:', response.id) Workaround
If you cannot reduce file size immediately, split the dataset into smaller chunks manually and upload each chunk separately for fine-tuning.
Prevention
Implement automated file size validation in your data pipeline before upload and use dataset chunking strategies to keep files under the limit consistently.