How to Intermediate · 4 min read

How to fine-tune LLM for legal domain

Quick answer
Fine-tune a large language model (LLM) for the legal domain by preparing a domain-specific dataset in JSONL format with legal texts and instructions, then use the OpenAI API's fine-tuning endpoints to upload the data and create a fine-tuning job. Use a base model like gpt-4o-mini and monitor the job until completion to deploy the specialized legal model.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Prepare your legal domain dataset in JSONL format where each entry contains messages with system, user, and assistant roles reflecting legal context. Upload the dataset, create a fine-tuning job with a base model like gpt-4o-mini, and poll the job status until the fine-tuned model is ready.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Step 1: Upload training file
with open("legal_finetune_data.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")

print(f"Uploaded file ID: {training_file.id}")

# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini"
)

print(f"Fine-tuning job ID: {job.id}")

# Step 3: Poll job status (simplified example)
import time

while True:
    status = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Status: {status.status}")
    if status.status in ["succeeded", "failed"]:
        break
    time.sleep(30)

if status.status == "succeeded":
    print(f"Fine-tuned model: {status.fine_tuned_model}")
else:
    print("Fine-tuning failed.")

# Step 4: Use the fine-tuned model
response = client.chat.completions.create(
    model=status.fine_tuned_model,
    messages=[{"role": "user", "content": "Explain contract termination clauses."}]
)
print(response.choices[0].message.content)
output
Uploaded file ID: file-abc123xyz
Fine-tuning job ID: job-xyz789abc
Status: running
Status: running
Status: succeeded
Fine-tuned model: gpt-4o-mini:ft-legal-2026-04-01
[Legal domain explanation about contract termination clauses]

Common variations

  • Use async Python with asyncio and await for non-blocking fine-tuning job polling.
  • Choose different base models like gpt-4o for larger capacity or gpt-4o-mini for cost efficiency.
  • Incorporate validation and test datasets to evaluate fine-tuned model performance on legal tasks.
python
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def poll_job(job_id):
    while True:
        status = client.fine_tuning.jobs.retrieve(job_id)
        print(f"Status: {status.status}")
        if status.status in ["succeeded", "failed"]:
            return status
        await asyncio.sleep(30)

async def main():
    # Assume training file uploaded and job created as before
    job = client.fine_tuning.jobs.create(training_file="file-abc123xyz", model="gpt-4o")
    print(f"Job ID: {job.id}")
    final_status = await poll_job(job.id)
    if final_status.status == "succeeded":
        print(f"Fine-tuned model: {final_status.fine_tuned_model}")
    else:
        print("Fine-tuning failed.")

asyncio.run(main())
output
Job ID: job-xyz789abc
Status: running
Status: running
Status: succeeded
Fine-tuned model: gpt-4o:ft-legal-2026-04-01

Troubleshooting

  • If you see file upload failed, check your file format is valid JSONL with correct messages structure.
  • If fine-tuning job fails, inspect logs via the API or dashboard for data quality issues or quota limits.
  • Ensure your API key has fine-tuning permissions and sufficient quota.
  • Use smaller datasets initially to validate the pipeline before scaling.

Key Takeaways

  • Prepare a high-quality, domain-specific JSONL dataset with legal conversations for fine-tuning.
  • Use the OpenAI API's fine_tuning.jobs endpoints to upload data, create jobs, and monitor progress.
  • Select a base model balancing cost and capability, such as gpt-4o-mini for legal tasks.
  • Validate and test your fine-tuned model on legal queries before production use.
  • Troubleshoot by verifying data format, API permissions, and monitoring job status carefully.
Verified 2026-04 · gpt-4o-mini, gpt-4o
Verify ↗