Code Intermediate medium · 7 min

Memory leak detection

What you will learn
Track GPU and CPU memory allocations to find where your model is holding onto unused tensors.

Why this matters

Memory leaks in PyTorch cause out-of-memory crashes in production, especially on limited GPU vram. Training runs that work fine for 10 epochs fail silently at epoch 50 when unused tensors accumulate. Detection lets you fix the issue before deployment.

Skip if: You don't need explicit memory tracking for small research scripts or single-batch inference. You also shouldn't obsess over every tensor allocation: focus on loops and long-running processes where tensors persist across iterations.

Explanation

A memory leak in PyTorch occurs when tensors remain allocated in GPU or CPU memory even though your code no longer references them. Unlike Python's garbage collector, PyTorch tensors don't always free memory automatically if they're still attached to the computational graph or held by accidental references.

PyTorch provides torch.cuda.memory_allocated(), torch.cuda.max_memory_allocated(), and torch.cuda.reset_peak_memory_stats() to track memory over time. By measuring memory before and after a code block, you can isolate which operations are leaking. The tracemalloc module in Python can also track CPU-side tensor allocations.

Use this when you notice memory usage climbing during training despite batch size staying constant, or when a loop that should be memory-flat consumes more each iteration. The most common cause is accidentally holding references to intermediate tensors in a list or keeping the computational graph alive with retain_graph=True.

Analogy

Like leaving the tap running in your sink: the water level (memory) rises steadily even though you're not actively filling anything. You need to measure the level before and after to prove the leak exists, then trace where the water is going.

Code

python
import torch
import gc

def detect_memory_leak():
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    if device.type == 'cuda':
        torch.cuda.reset_peak_memory_stats(device)
        torch.cuda.empty_cache()
    
    print(f"Device: {device}")
    print()
    
    # Scenario 1: No leak — proper cleanup
    print("=== Scenario 1: No leak (proper cleanup) ===")
    if device.type == 'cuda':
        mem_before = torch.cuda.memory_allocated(device)
    
    tensors = []
    for i in range(5):
        t = torch.randn(1000, 1000, device=device)
        result = t @ t.T
        # Clean up explicitly
        del t
        del result
    
    gc.collect()
    if device.type == 'cuda':
        torch.cuda.empty_cache()
        mem_after = torch.cuda.memory_allocated(device)
        leak_1 = mem_after - mem_before
        print(f"Memory change: {leak_1 / 1024 / 1024:.2f} MB")
    else:
        print("(CPU mode — using gc to measure)")
    
    print()
    
    # Scenario 2: Leak — accumulating references
    print("=== Scenario 2: Memory leak (accumulating tensors) ===")
    if device.type == 'cuda':
        torch.cuda.reset_peak_memory_stats(device)
        mem_before = torch.cuda.memory_allocated(device)
    
    accumulated = []  # Accidentally holding references
    for i in range(5):
        t = torch.randn(1000, 1000, device=device)
        result = t @ t.T
        accumulated.append(result)  # Leak! Holding onto tensor
    
    gc.collect()
    if device.type == 'cuda':
        mem_after = torch.cuda.memory_allocated(device)
        leak_2 = mem_after - mem_before
        print(f"Memory change: {leak_2 / 1024 / 1024:.2f} MB")
        print(f"Accumulated {len(accumulated)} tensors in list")
    else:
        print(f"(CPU mode — accumulated {len(accumulated)} tensors in list)")
    
    print()
    
    # Scenario 3: Using detach to break graph
    print("=== Scenario 3: Breaking graph leak ===")
    if device.type == 'cuda':
        torch.cuda.reset_peak_memory_stats(device)
        mem_before = torch.cuda.memory_allocated(device)
    
    graph_refs = []
    for i in range(5):
        x = torch.randn(1000, 1000, device=device, requires_grad=True)
        loss = (x ** 2).sum()
        # Detach to avoid keeping computational graph alive
        graph_refs.append(loss.detach())
    
    gc.collect()
    if device.type == 'cuda':
        mem_after = torch.cuda.memory_allocated(device)
        leak_3 = mem_after - mem_before
        print(f"Memory change: {leak_3 / 1024 / 1024:.2f} MB")
        print(f"Used .detach() to break graph")
    else:
        print(f"(CPU mode — used .detach() to break graph)")
    
    # Show peak usage
    if device.type == 'cuda':
        peak = torch.cuda.max_memory_allocated(device)
        print()
        print(f"Peak memory allocated: {peak / 1024 / 1024:.2f} MB")

detect_memory_leak()
Output
Device: cpu

=== Scenario 1: No leak (proper cleanup) ===
(CPU mode: using gc to measure)

=== Scenario 2: Memory leak (accumulating tensors) ===
(CPU mode: accumulated 5 tensors in list)

=== Scenario 3: Breaking graph leak ===
(CPU mode: used .detach() to break graph)

What just happened?

The code ran three memory-leak scenarios: (1) properly deleted tensors each iteration, (2) accumulated tensor references in a list without deleting them, (3) broke the computational graph using .detach() so tensors could be freed. On GPU, you'd see memory growth in scenario 2 but flat memory in scenarios 1 and 3. On CPU, the code runs but illustrates the patterns: you'd see actual byte counts on CUDA devices.

Common gotcha

The biggest mistake is confusing peak memory allocation with current memory leaks. A tensor allocated in iteration 1 that's properly deleted in iteration 1 doesn't cause a leak, but max_memory_allocated() still counts it. You must compare memory before and after a loop, or use memory_allocated() - memory_reserved() to catch real leaks. Also, torch.cuda.empty_cache() doesn't free memory from live tensors: only truly unreferenced memory.

Error recovery

RuntimeError: CUDA out of memory
Memory leak during training loop. Add memory tracking inside the loop: measure before/after each iteration. If memory grows each iteration, check for accumulated lists, use .detach() on intermediate tensors, or call del on large tensors after use.
AttributeError: 'NoneType' has no attribute '_*'
Happens when a tensor is deleted and you try to use it later. Caused by overly aggressive cleanup (del tensor) before the tensor is actually done being used. Solution: only delete after you're certain the tensor won't be referenced again, or use with torch.no_grad(): for inference to prevent graph building.
Memory climbs but no obvious tensor storage
Cached kernels, cuDNN algorithms, or the computation graph. Solution: add .detach() to break the graph for intermediate results, use torch.no_grad() context for inference, or check if you're accidentally calling .backward() in a loop without zeroing gradients.

Experienced dev note

In real production training, the sneakiest leaks come from validation loops where you accidentally keep a list of loss values (each holding a computational graph) or from logging hooks that capture intermediate activations. The pattern is: if you're appending to a list or dict in a loop, and those values contain tensors with requires_grad=True or are part of a graph, you're almost certainly leaking. Use .item() to extract scalars, or .detach() before storing. Also, profilers like PyTorch's built-in profiler (torch.profiler) show memory peaks but not accumulation: memory_allocated() is your actual debugging tool.

Check your understanding

You have a training loop that runs for 100 epochs, and you notice GPU memory climbs from 6GB at epoch 1 to 11GB at epoch 50, even though your batch size is fixed. The memory stabilizes at 11GB for epochs 50–100. Is this a memory leak, and what's the evidence?

Show answer hint

A real leak would climb every epoch. Stabilizing memory suggests the initial climb was legitimate (model initialization, caching, graph building) but doesn't leak further. Check if memory growth happens only in the first few epochs or continuously. The evidence is running the same loop again and seeing if memory climbs from 11GB to 16GB: if it does, it's leaking; if it stays at 11GB, it's not.

VERSION PyTorch 2.11.x (March 2026) uses torch.cuda.memory_allocated() and torch.cuda.max_memory_allocated(): these APIs have been stable since 1.3.0. However, PyTorch 2.0+ added torch.cuda._get_device_index() changes and modified cuDNN cache behavior. If you're on < 2.0, memory reporting is slightly less accurate due to different cache management. Always call torch.cuda.empty_cache() before critical memory measurements in versions < 2.5.
NEXT

Profiling GPU kernels with torch.profiler to identify which operations consume the most memory and time: moving from leak detection to optimization.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.