How to fine-tune LLM for legal domain
Quick answer
Fine-tune a large language model (LLM) for the legal domain by preparing a domain-specific dataset in JSONL format with legal texts and instructions, then use the OpenAI API's fine-tuning endpoints to upload the data and create a fine-tuning job. Use a base model like gpt-4o-mini and monitor the job until completion to deploy the specialized legal model.
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Prepare your legal domain dataset in JSONL format where each entry contains messages with system, user, and assistant roles reflecting legal context. Upload the dataset, create a fine-tuning job with a base model like gpt-4o-mini, and poll the job status until the fine-tuned model is ready.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Step 1: Upload training file
with open("legal_finetune_data.jsonl", "rb") as f:
training_file = client.files.create(file=f, purpose="fine-tune")
print(f"Uploaded file ID: {training_file.id}")
# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=training_file.id,
model="gpt-4o-mini"
)
print(f"Fine-tuning job ID: {job.id}")
# Step 3: Poll job status (simplified example)
import time
while True:
status = client.fine_tuning.jobs.retrieve(job.id)
print(f"Status: {status.status}")
if status.status in ["succeeded", "failed"]:
break
time.sleep(30)
if status.status == "succeeded":
print(f"Fine-tuned model: {status.fine_tuned_model}")
else:
print("Fine-tuning failed.")
# Step 4: Use the fine-tuned model
response = client.chat.completions.create(
model=status.fine_tuned_model,
messages=[{"role": "user", "content": "Explain contract termination clauses."}]
)
print(response.choices[0].message.content) output
Uploaded file ID: file-abc123xyz Fine-tuning job ID: job-xyz789abc Status: running Status: running Status: succeeded Fine-tuned model: gpt-4o-mini:ft-legal-2026-04-01 [Legal domain explanation about contract termination clauses]
Common variations
- Use async Python with
asyncioandawaitfor non-blocking fine-tuning job polling. - Choose different base models like
gpt-4ofor larger capacity orgpt-4o-minifor cost efficiency. - Incorporate validation and test datasets to evaluate fine-tuned model performance on legal tasks.
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def poll_job(job_id):
while True:
status = client.fine_tuning.jobs.retrieve(job_id)
print(f"Status: {status.status}")
if status.status in ["succeeded", "failed"]:
return status
await asyncio.sleep(30)
async def main():
# Assume training file uploaded and job created as before
job = client.fine_tuning.jobs.create(training_file="file-abc123xyz", model="gpt-4o")
print(f"Job ID: {job.id}")
final_status = await poll_job(job.id)
if final_status.status == "succeeded":
print(f"Fine-tuned model: {final_status.fine_tuned_model}")
else:
print("Fine-tuning failed.")
asyncio.run(main()) output
Job ID: job-xyz789abc Status: running Status: running Status: succeeded Fine-tuned model: gpt-4o:ft-legal-2026-04-01
Troubleshooting
- If you see
file upload failed, check your file format is valid JSONL with correctmessagesstructure. - If fine-tuning job
fails, inspect logs via the API or dashboard for data quality issues or quota limits. - Ensure your API key has fine-tuning permissions and sufficient quota.
- Use smaller datasets initially to validate the pipeline before scaling.
Key Takeaways
- Prepare a high-quality, domain-specific JSONL dataset with legal conversations for fine-tuning.
- Use the OpenAI API's fine_tuning.jobs endpoints to upload data, create jobs, and monitor progress.
- Select a base model balancing cost and capability, such as gpt-4o-mini for legal tasks.
- Validate and test your fine-tuned model on legal queries before production use.
- Troubleshoot by verifying data format, API permissions, and monitoring job status carefully.