Memory leak detection
Why this matters
Memory leaks in PyTorch cause out-of-memory crashes in production, especially on limited GPU vram. Training runs that work fine for 10 epochs fail silently at epoch 50 when unused tensors accumulate. Detection lets you fix the issue before deployment.
Explanation
A memory leak in PyTorch occurs when tensors remain allocated in GPU or CPU memory even though your code no longer references them. Unlike Python's garbage collector, PyTorch tensors don't always free memory automatically if they're still attached to the computational graph or held by accidental references.
PyTorch provides torch.cuda.memory_allocated(), torch.cuda.max_memory_allocated(), and torch.cuda.reset_peak_memory_stats() to track memory over time. By measuring memory before and after a code block, you can isolate which operations are leaking. The tracemalloc module in Python can also track CPU-side tensor allocations.
Use this when you notice memory usage climbing during training despite batch size staying constant, or when a loop that should be memory-flat consumes more each iteration. The most common cause is accidentally holding references to intermediate tensors in a list or keeping the computational graph alive with retain_graph=True.
Analogy
Like leaving the tap running in your sink: the water level (memory) rises steadily even though you're not actively filling anything. You need to measure the level before and after to prove the leak exists, then trace where the water is going.
Code
import torch
import gc
def detect_memory_leak():
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if device.type == 'cuda':
torch.cuda.reset_peak_memory_stats(device)
torch.cuda.empty_cache()
print(f"Device: {device}")
print()
# Scenario 1: No leak — proper cleanup
print("=== Scenario 1: No leak (proper cleanup) ===")
if device.type == 'cuda':
mem_before = torch.cuda.memory_allocated(device)
tensors = []
for i in range(5):
t = torch.randn(1000, 1000, device=device)
result = t @ t.T
# Clean up explicitly
del t
del result
gc.collect()
if device.type == 'cuda':
torch.cuda.empty_cache()
mem_after = torch.cuda.memory_allocated(device)
leak_1 = mem_after - mem_before
print(f"Memory change: {leak_1 / 1024 / 1024:.2f} MB")
else:
print("(CPU mode — using gc to measure)")
print()
# Scenario 2: Leak — accumulating references
print("=== Scenario 2: Memory leak (accumulating tensors) ===")
if device.type == 'cuda':
torch.cuda.reset_peak_memory_stats(device)
mem_before = torch.cuda.memory_allocated(device)
accumulated = [] # Accidentally holding references
for i in range(5):
t = torch.randn(1000, 1000, device=device)
result = t @ t.T
accumulated.append(result) # Leak! Holding onto tensor
gc.collect()
if device.type == 'cuda':
mem_after = torch.cuda.memory_allocated(device)
leak_2 = mem_after - mem_before
print(f"Memory change: {leak_2 / 1024 / 1024:.2f} MB")
print(f"Accumulated {len(accumulated)} tensors in list")
else:
print(f"(CPU mode — accumulated {len(accumulated)} tensors in list)")
print()
# Scenario 3: Using detach to break graph
print("=== Scenario 3: Breaking graph leak ===")
if device.type == 'cuda':
torch.cuda.reset_peak_memory_stats(device)
mem_before = torch.cuda.memory_allocated(device)
graph_refs = []
for i in range(5):
x = torch.randn(1000, 1000, device=device, requires_grad=True)
loss = (x ** 2).sum()
# Detach to avoid keeping computational graph alive
graph_refs.append(loss.detach())
gc.collect()
if device.type == 'cuda':
mem_after = torch.cuda.memory_allocated(device)
leak_3 = mem_after - mem_before
print(f"Memory change: {leak_3 / 1024 / 1024:.2f} MB")
print(f"Used .detach() to break graph")
else:
print(f"(CPU mode — used .detach() to break graph)")
# Show peak usage
if device.type == 'cuda':
peak = torch.cuda.max_memory_allocated(device)
print()
print(f"Peak memory allocated: {peak / 1024 / 1024:.2f} MB")
detect_memory_leak() Device: cpu === Scenario 1: No leak (proper cleanup) === (CPU mode: using gc to measure) === Scenario 2: Memory leak (accumulating tensors) === (CPU mode: accumulated 5 tensors in list) === Scenario 3: Breaking graph leak === (CPU mode: used .detach() to break graph)
What just happened?
The code ran three memory-leak scenarios: (1) properly deleted tensors each iteration, (2) accumulated tensor references in a list without deleting them, (3) broke the computational graph using .detach() so tensors could be freed. On GPU, you'd see memory growth in scenario 2 but flat memory in scenarios 1 and 3. On CPU, the code runs but illustrates the patterns: you'd see actual byte counts on CUDA devices.
Common gotcha
The biggest mistake is confusing peak memory allocation with current memory leaks. A tensor allocated in iteration 1 that's properly deleted in iteration 1 doesn't cause a leak, but max_memory_allocated() still counts it. You must compare memory before and after a loop, or use memory_allocated() - memory_reserved() to catch real leaks. Also, torch.cuda.empty_cache() doesn't free memory from live tensors: only truly unreferenced memory.
Error recovery
RuntimeError: CUDA out of memoryAttributeError: 'NoneType' has no attribute '_*'Memory climbs but no obvious tensor storageExperienced dev note
In real production training, the sneakiest leaks come from validation loops where you accidentally keep a list of loss values (each holding a computational graph) or from logging hooks that capture intermediate activations. The pattern is: if you're appending to a list or dict in a loop, and those values contain tensors with requires_grad=True or are part of a graph, you're almost certainly leaking. Use .item() to extract scalars, or .detach() before storing. Also, profilers like PyTorch's built-in profiler (torch.profiler) show memory peaks but not accumulation: memory_allocated() is your actual debugging tool.
Check your understanding
You have a training loop that runs for 100 epochs, and you notice GPU memory climbs from 6GB at epoch 1 to 11GB at epoch 50, even though your batch size is fixed. The memory stabilizes at 11GB for epochs 50–100. Is this a memory leak, and what's the evidence?
Show answer hint
A real leak would climb every epoch. Stabilizing memory suggests the initial climb was legitimate (model initialization, caching, graph building) but doesn't leak further. Check if memory growth happens only in the first few epochs or continuously. The evidence is running the same loop again and seeing if memory climbs from 11GB to 16GB: if it does, it's leaking; if it stays at 11GB, it's not.