How to beginner · 3 min read

OpenAI fine-tuning data format JSONL

Quick answer
OpenAI fine-tuning data must be in JSONL format where each line is a JSON object with a messages array containing role and content fields. The messages array should include the full conversation context, typically starting with a system message, followed by user and assistant messages.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable for authentication.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Prepare your fine-tuning training file as a .jsonl file where each line is a JSON object with a messages key. The messages value is an array of message objects with role and content. Include the full conversation context for each example.

python
import json

# Example fine-tuning data with 2 training samples
training_data = [
    {
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Translate 'Hello' to French."},
            {"role": "assistant", "content": "Bonjour"}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is 2 + 2?"},
            {"role": "assistant", "content": "4"}
        ]
    }
]

# Write to JSONL file
with open("fine_tune_data.jsonl", "w", encoding="utf-8") as f:
    for entry in training_data:
        f.write(json.dumps(entry) + "\n")

print("Fine-tuning data saved to fine_tune_data.jsonl")
output
Fine-tuning data saved to fine_tune_data.jsonl

Common variations

You can fine-tune with different models like gpt-4o-mini-2024-07-18. The messages array must always include the full context: system, user, and assistant roles. Async uploading and job creation is supported via the OpenAI SDK. Streaming is not applicable for fine-tuning data upload.

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Upload training file
training_file = client.files.create(
    file=open("fine_tune_data.jsonl", "rb"),
    purpose="fine-tune"
)

# Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini-2024-07-18"
)

print(f"Fine-tuning job created with ID: {job.id}")
output
Fine-tuning job created with ID: ftjob-abc123xyz

Troubleshooting

  • If you get a 400 Bad Request error, verify your JSONL lines are valid JSON and each has a messages array with correct roles.
  • Ensure no trailing commas or invalid JSON syntax in your .jsonl file.
  • Use UTF-8 encoding to avoid character errors.
  • Check your API key and permissions if upload fails.

Key Takeaways

  • Fine-tuning data must be a JSONL file with each line containing a 'messages' array of role/content objects.
  • Include full conversation context: system, user, and assistant messages for each training example.
  • Use the OpenAI SDK v1+ to upload the JSONL file and create fine-tuning jobs programmatically.
Verified 2026-04 · gpt-4o-mini-2024-07-18
Verify ↗