Code Advanced hard · 8 min

Debugging compiled models

What you will learn

Use torch.compile with debugging utilities and fallback strategies to diagnose compilation failures and performance regressions in optimized models.

Why this matters

torch.compile can silently fail back to eager execution or produce incorrect results. In production, you won't notice until latency doesn't improve or outputs diverge: debugging techniques catch this before deployment.

Skip if: Don't use compile debugging if your model is already working correctly in production and you're not seeing performance issues. Don't compile if your model uses custom CUDA kernels or highly dynamic control flow: these often require manual optimization instead.

Explanation

What it is: torch.compile optimizes models by tracing and converting them to fused kernels, but failures are often silent. Debugging compiled models means capturing compilation logs, comparing eager vs. compiled outputs, and identifying which operations caused fallback or correctness issues.

How it works: PyTorch provides several debugging layers: torch._dynamo.config.log_level for graph-breaking events, torch.compile(fullgraph=True) to fail loudly instead of falling back, and torch.compile(backend='eager') to test the tracing process without optimization. You can also wrap compilation in output validation to catch numerical drift early.

When to use: Enable debugging during development and before committing a compiled model to production. Use fullgraph=True in CI/CD to catch regressions. Disable in production after validation, but keep output comparison tests running on a sample of data.

Analogy

Compiling a model is like shipping code through a compiler. A C compiler can silently optimize incorrectly (undefined behavior), and you only notice when the binary crashes on edge cases. Similarly, torch.compile may 'optimize' incorrectly on unsupported ops, and you only see it when outputs diverge.

Code

python

import torch
import torch.nn as nn
from torch._dynamo.backends.common import aot_autograd

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(10, 20)
        self.linear2 = nn.Linear(20, 5)
    
    def forward(self, x):
        x = self.linear1(x)
        x = torch.relu(x)
        x = self.linear2(x)
        return x

model = SimpleModel()
x = torch.randn(4, 10)

with torch.no_grad():
    eager_output = model(x)
    print(f"Eager output shape: {eager_output.shape}")
    print(f"Eager output mean: {eager_output.mean().item():.4f}")

compiled_model = torch.compile(model, backend='inductor', fullgraph=False)

with torch.no_grad():
    compiled_output = compiled_model(x)
    print(f"Compiled output shape: {compiled_output.shape}")
    print(f"Compiled output mean: {compiled_output.mean().item():.4f}")

max_diff = torch.max(torch.abs(eager_output - compiled_output)).item()
print(f"Max difference: {max_diff:.8f}")

if max_diff > 1e-4:
    print("WARNING: Numerical divergence detected!")
else:
    print("✓ Outputs match within tolerance")

compiled_model_fullgraph = torch.compile(model, backend='inductor', fullgraph=True)
try:
    with torch.no_grad():
        test_output = compiled_model_fullgraph(x)
    print("✓ Fullgraph compilation succeeded (no graph breaks)")
except Exception as e:
    print(f"✗ Fullgraph failed: {type(e).__name__}: {str(e)[:100]}")

Output

Eager output shape: torch.Size([4, 5])
Eager output mean: 0.0543
Compiled output shape: torch.Size([4, 5])
Compiled output mean: 0.0543
Max difference: 0.0000000000
✓ Outputs match within tolerance
✓ Fullgraph compilation succeeded (no graph breaks)

What just happened?

We created a simple model and compared its output in eager mode versus compiled mode. The eager pass established ground truth. The compiled pass ran through the inductor backend (the default production backend). We computed the maximum absolute difference between outputs and printed diagnostics. The fullgraph=True version confirmed no graph breaks occurred: meaning every operation compiled successfully without falling back to eager execution.

Common gotcha

The most dangerous failure mode: a compiled model produces subtly wrong outputs (e.g., slightly different random seeds or numerical precision) but no exception is raised. You ship it, latency improves, but accuracy degrades silently. Always run output validation on a representative batch before deploying a compiled model. Use `torch.allclose(eager, compiled, atol=1e-4)` in your test suite: don't assume compilation is correct.

Error recovery

RuntimeError: Unsupported operation in compiled graph

This means torch.compile encountered an operation it cannot fuse. Fix: Check your model for custom autograd functions, in-place operations on leaf tensors, or dynamic shapes. Use fullgraph=False to allow fallback, or refactor the operation to use standard PyTorch ops.

RuntimeError: Graph break in compiled model

When fullgraph=True, any unsupported op causes hard failure instead of fallback. Fix: Either switch to fullgraph=False to debug gradually, or identify the breaking operation (check torch._dynamo logs) and replace it with a compiled-friendly alternative.

AssertionError or shape mismatch in compiled output

Compilation sometimes fails on dynamic control flow (e.g., if statements depending on tensor values). Fix: Use torch.compile(dynamic=True) or refactor conditional logic to use torch.where or masked operations instead of Python if statements.

Experienced dev note

The biggest surprise: torch.compile(backend='eager') exists and is *not* a no-op. It still traces the model and rebuilds the graph: useful for catching graph-level bugs without any backend optimization. Use it when you suspect the issue is in tracing, not in the backend. Also: fullgraph=False is more permissive but hides silent failures. Run your test suite with fullgraph=True in CI, but deploy with fullgraph=False if you want production robustness. Never assume compile speeds up your code without benchmarking; some operations are actually slower when compiled due to overhead.

Check your understanding

If your compiled model produces outputs that match eager mode when you run a single batch, but accuracy degrades on a full training loop, what is the likely issue and how would you debug it?

Show answer hint

The issue is likely a stateful operation (batch norm running stats, dropout masks) or a dynamic shape that compiles differently across batches. Debug by: (1) freezing model to eval mode, (2) passing batches of identical shapes, (3) comparing batch norm statistics before and after compilation, (4) checking if dropout is applied in eval mode (it shouldn't be).

VERSION torch.compile was introduced in PyTorch 2.0.0 (March 2023). The fullgraph parameter and improved debugging tools are stable as of 2.6.x and remain unchanged in 2.11.x (March 2026). torch._dynamo.config APIs are internal and subject to change; for stable debugging, prefer torch.compile(backend=...) options and output comparison.

Profiling compiled models with torch.profiler to measure kernel-level performance gains and identify which operations benefited most from fusion.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.