Critical severity intermediate · Fix: 5-15 min

RuntimeError

torch.cuda.runtime.RuntimeError

What this error means
PEFT fine-tuning fails with a RuntimeError when gradient checkpointing is enabled but incompatible with the model or training setup.

Stack trace

traceback
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate values of the first backward are already freed. Specify retain_graph=True when calling backward the first time.
QUICK FIX
Disable gradient checkpointing in PEFT config or ensure backward() calls use retain_graph=True during fine-tuning.

Why it happens

This error occurs because gradient checkpointing saves memory by freeing intermediate activations during the forward pass. If the backward pass is called multiple times without retain_graph=True, or if the model's forward method is incompatible with checkpointing, PyTorch raises this RuntimeError.

Detection

Monitor training logs for RuntimeError mentioning backward graph retention or checkpointing issues; wrap training steps in try/except to log detailed error messages.

Causes & fixes

1

Gradient checkpointing enabled but model's forward method returns multiple outputs or uses unsupported operations.

✓ Fix

Modify the model's forward method to return a single tensor or tuple compatible with checkpointing, or disable gradient checkpointing.

2

Calling backward multiple times without retain_graph=True when using gradient checkpointing.

✓ Fix

Add retain_graph=True to the first backward() call or restructure training loop to avoid multiple backward passes per forward.

3

Using an older PyTorch or PEFT version with known checkpointing bugs.

✓ Fix

Upgrade PyTorch and PEFT to the latest stable versions where checkpointing issues are fixed.

Code: broken vs fixed

Broken - triggers the error
python
from peft import get_peft_model
import torch

model = get_peft_model(base_model, peft_config)
model.gradient_checkpointing_enable()  # enables checkpointing

optimizer = torch.optim.Adam(model.parameters())

outputs = model(inputs)
loss = loss_fn(outputs, labels)
loss.backward()  # RuntimeError here
optimizer.step()
Fixed - works correctly
python
import os
from peft import get_peft_model
import torch

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

model = get_peft_model(base_model, peft_config)
# Disabled gradient checkpointing to fix RuntimeError
# model.gradient_checkpointing_enable()

optimizer = torch.optim.Adam(model.parameters())

outputs = model(inputs)
loss = loss_fn(outputs, labels)
loss.backward(retain_graph=True)  # Added retain_graph=True to fix error
optimizer.step()

print('Fine-tuning step completed without checkpointing error')
Disabled gradient checkpointing or added retain_graph=True to backward() to prevent RuntimeError caused by freed intermediate activations during multiple backward passes.

Workaround

Wrap the backward call in try/except RuntimeError, and if caught, retry backward with retain_graph=True or disable gradient checkpointing temporarily.

Prevention

Use compatible model architectures with gradient checkpointing, test backward passes carefully, and keep PEFT and PyTorch updated to latest versions with checkpointing fixes.

Python 3.9+ · peft >=0.4.0 · tested on 0.4.0
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.