How to Intermediate · 3 min read

How much data do you need to fine-tune an LLM

Quick answer

The amount of data needed to fine-tune a large language model (LLM) depends on the model size and task complexity, but typically ranges from a few thousand to hundreds of thousands of labeled examples. Smaller fine-tuning datasets (e.g., 1k–10k examples) can work for domain adaptation or instruction tuning, while larger datasets (100k+) improve generalization but increase cost and compute.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to fine-tune models using the OpenAI API.

bash

pip install openai

Step by step

This example shows how to prepare a small dataset and initiate fine-tuning with the OpenAI API using gpt-4o. The dataset size here is minimal (a few examples) for demonstration.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example fine-tuning dataset: list of prompt-completion pairs
training_data = [
    {"prompt": "Translate English to French: 'Hello, how are you?'\n", "completion": "Bonjour, comment ça va ?"},
    {"prompt": "Translate English to French: 'Good night'\n", "completion": "Bonne nuit"}
]

# Save training data to JSONL file
import json
with open("fine_tune_data.jsonl", "w") as f:
    for entry in training_data:
        json.dump(entry, f)
        f.write("\n")

# Upload training file
upload_response = client.files.create(
    file=open("fine_tune_data.jsonl", "rb"),
    purpose="fine-tune"
)
file_id = upload_response.id

# Create fine-tune job
fine_tune_response = client.fine_tunes.create(
    training_file=file_id,
    model="gpt-4o"
)

print("Fine-tune job created:", fine_tune_response.id)

output

Fine-tune job created: ft-abc123xyz

Common variations

You can fine-tune with different dataset sizes depending on your goal:

Small datasets (1k–10k examples): Good for domain adaptation or instruction tuning.
Medium datasets (10k–100k examples): Improve task-specific accuracy and generalization.
Large datasets (100k+ examples): Needed for training new capabilities or large-scale custom models.

Also, you can use asynchronous fine-tuning APIs or different models like gpt-4o-mini for faster, cheaper fine-tuning.

Dataset size	Use case	Effect
1k–10k examples	Domain adaptation, instruction tuning	Quick adaptation, lower cost
10k–100k examples	Task-specific fine-tuning	Better accuracy and generalization
100k+ examples	Large-scale custom models	New capabilities, higher cost

Troubleshooting

If your fine-tuned model underperforms, check if your dataset is too small or not representative enough. Also, ensure your training data is clean and formatted correctly as JSONL with prompt-completion pairs. If you hit API rate limits or errors, verify your API key and usage quotas.

Key Takeaways

Fine-tuning data size depends on model size and task complexity, ranging from 1k to 100k+ examples.
Smaller datasets enable quick domain adaptation; larger datasets improve generalization but cost more.
Always format fine-tuning data as JSONL with prompt-completion pairs for OpenAI API.
Use smaller models like gpt-4o-mini for cost-effective fine-tuning on limited data.
Monitor API usage and clean your dataset to avoid common fine-tuning errors.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.