What is catastrophic forgetting in fine-tuning
fine-tuning is the phenomenon where a model loses previously learned knowledge after being trained on new data. It occurs because the model's parameters adjust to the new task, overwriting earlier information without retaining it.How it works
Catastrophic forgetting happens during fine-tuning when a model updates its weights to fit new data, but these updates overwrite the representations learned from earlier tasks. Imagine a student who learns French first, then switches to only studying Spanish; without review, they might forget French entirely. Similarly, the model's parameters shift to optimize for the new task, causing it to 'forget' the old one.
Concrete example
Suppose you fine-tune a language model first on a dataset about medical text, then fine-tune it again on legal documents without preserving the medical knowledge. The model may perform well on legal text but poorly on medical queries, showing catastrophic forgetting.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Initial fine-tuning on medical data (simulated)
medical_prompt = "Explain symptoms of diabetes."
response_medical = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": medical_prompt}]
)
print("Medical domain response:", response_medical.choices[0].message.content)
# Fine-tuning on legal data (simulated by prompt engineering here)
legal_prompt = "Explain contract breach consequences."
response_legal = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": legal_prompt}]
)
print("Legal domain response:", response_legal.choices[0].message.content)
# After fine-tuning on legal, medical knowledge may degrade (catastrophic forgetting) Medical domain response: Diabetes symptoms include increased thirst, frequent urination, and fatigue. Legal domain response: Breach of contract can lead to damages, specific performance, or contract termination.
When to use it
Use fine-tuning carefully when you want to adapt a model to a new domain but still retain previous knowledge. Avoid naive sequential fine-tuning if you need multi-domain expertise. Instead, use techniques like continual learning, rehearsal, or parameter-efficient fine-tuning to mitigate catastrophic forgetting.
Key terms
| Term | Definition |
|---|---|
| Catastrophic forgetting | Loss of previously learned knowledge when a model is fine-tuned on new data. |
| Fine-tuning | Training a pre-trained model further on a specific dataset to adapt it to a new task. |
| Continual learning | Techniques to train models on new tasks without forgetting old ones. |
| Rehearsal | Method of mixing old data with new data during fine-tuning to prevent forgetting. |
| Parameter-efficient fine-tuning | Fine-tuning only a subset of model parameters to preserve prior knowledge. |
Key Takeaways
- Catastrophic forgetting occurs when fine-tuning overwrites a model's prior knowledge.
- Sequential fine-tuning without safeguards leads to degraded performance on earlier tasks.
- Use continual learning or parameter-efficient methods to prevent forgetting.
- Testing on original tasks after fine-tuning reveals if catastrophic forgetting happened.