How to Intermediate · 3 min read

How much data do you need to fine-tune an LLM

Quick answer
The amount of data needed to fine-tune a large language model (LLM) depends on the model size and task complexity, but typically ranges from a few thousand to hundreds of thousands of labeled examples. Smaller fine-tuning datasets (e.g., 1k–10k examples) can work for domain adaptation or instruction tuning, while larger datasets (100k+) improve generalization but increase cost and compute.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to fine-tune models using the OpenAI API.

bash
pip install openai

Step by step

This example shows how to prepare a small dataset and initiate fine-tuning with the OpenAI API using gpt-4o. The dataset size here is minimal (a few examples) for demonstration.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example fine-tuning dataset: list of prompt-completion pairs
training_data = [
    {"prompt": "Translate English to French: 'Hello, how are you?'\n", "completion": "Bonjour, comment ça va ?"},
    {"prompt": "Translate English to French: 'Good night'\n", "completion": "Bonne nuit"}
]

# Save training data to JSONL file
import json
with open("fine_tune_data.jsonl", "w") as f:
    for entry in training_data:
        json.dump(entry, f)
        f.write("\n")

# Upload training file
upload_response = client.files.create(
    file=open("fine_tune_data.jsonl", "rb"),
    purpose="fine-tune"
)
file_id = upload_response.id

# Create fine-tune job
fine_tune_response = client.fine_tunes.create(
    training_file=file_id,
    model="gpt-4o"
)

print("Fine-tune job created:", fine_tune_response.id)
output
Fine-tune job created: ft-abc123xyz

Common variations

You can fine-tune with different dataset sizes depending on your goal:

  • Small datasets (1k–10k examples): Good for domain adaptation or instruction tuning.
  • Medium datasets (10k–100k examples): Improve task-specific accuracy and generalization.
  • Large datasets (100k+ examples): Needed for training new capabilities or large-scale custom models.

Also, you can use asynchronous fine-tuning APIs or different models like gpt-4o-mini for faster, cheaper fine-tuning.

Dataset sizeUse caseEffect
1k–10k examplesDomain adaptation, instruction tuningQuick adaptation, lower cost
10k–100k examplesTask-specific fine-tuningBetter accuracy and generalization
100k+ examplesLarge-scale custom modelsNew capabilities, higher cost

Troubleshooting

If your fine-tuned model underperforms, check if your dataset is too small or not representative enough. Also, ensure your training data is clean and formatted correctly as JSONL with prompt-completion pairs. If you hit API rate limits or errors, verify your API key and usage quotas.

Key Takeaways

  • Fine-tuning data size depends on model size and task complexity, ranging from 1k to 100k+ examples.
  • Smaller datasets enable quick domain adaptation; larger datasets improve generalization but cost more.
  • Always format fine-tuning data as JSONL with prompt-completion pairs for OpenAI API.
  • Use smaller models like gpt-4o-mini for cost-effective fine-tuning on limited data.
  • Monitor API usage and clean your dataset to avoid common fine-tuning errors.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗