Code Advanced hard · 8 min

Verifying ONNX output

What you will learn
Compare PyTorch model outputs against ONNX runtime outputs to catch numerical divergence and operator incompatibilities before production.

Why this matters

ONNX export strips away PyTorch's dynamic features and relies on opset implementations that may differ numerically. Verification catches silent failures: your model runs, produces different answers, and you never know. This is especially critical in inference pipelines where ONNX is used for serving.

Skip if: Skip this if you're only training in PyTorch and serving in PyTorch. Skip it if your model has no numerical sensitivity (e.g., classification with large margins). But if you export to ONNX for production inference, you must verify.

Explanation

What it is: ONNX verification compares the numerical outputs of your original PyTorch model against the exported ONNX model running under ONNX Runtime. Both receive identical inputs and produce outputs that should be numerically close: but aren't always.

How it works: You generate test inputs, run them through both the PyTorch model (in eval mode, no gradients) and the ONNX model (via onnxruntime.InferenceSession). You then compute numerical differences (absolute error, relative error, or allclose with tolerances). Silent precision loss happens at opset boundaries: for example, some opsets quantize weights differently or use approximations for transcendental functions.

When to use it: Always before deploying an ONNX model to inference. Test with representative data and edge cases (extreme values, batch sizes different from training, variable sequence lengths if applicable). Misaligned outputs here will cause runtime surprises in production.

Analogy

It's like exporting a recipe from one language to another. The ingredients list translates, the steps translate, but if you don't cook it both ways and taste them side-by-side, you might serve something that looks right but tastes wrong.

Code

python
import torch
import torch.nn as nn
import onnx
import onnxruntime as ort
import numpy as np
from pathlib import Path

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 8)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(8, 3)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = SimpleNet()
model.eval()

dummy_input = torch.randn(1, 10)
onnx_path = "model.onnx"

torch.onnx.export(
    model,
    dummy_input,
    onnx_path,
    input_names=["input"],
    output_names=["output"],
    opset_version=14,
    do_constant_folding=True
)

sess = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

test_inputs = [
    torch.randn(1, 10),
    torch.randn(2, 10),
    torch.randn(4, 10)
]

max_abs_error = 0.0
max_rel_error = 0.0

with torch.no_grad():
    for i, test_input in enumerate(test_inputs):
        pytorch_output = model(test_input).numpy()
        
        onnx_input = {"input": test_input.numpy().astype(np.float32)}
        onnx_output = sess.run(None, onnx_input)[0]
        
        abs_error = np.abs(pytorch_output - onnx_output).max()
        rel_error = np.abs((pytorch_output - onnx_output) / (np.abs(pytorch_output) + 1e-8)).max()
        
        max_abs_error = max(max_abs_error, abs_error)
        max_rel_error = max(max_rel_error, rel_error)
        
        print(f"Test {i+1} | Shape: {test_input.shape} | Abs Error: {abs_error:.2e} | Rel Error: {rel_error:.2e}")
        
        if not np.allclose(pytorch_output, onnx_output, atol=1e-4, rtol=1e-3):
            print(f"  ⚠️  DIVERGENCE DETECTED at test {i+1}")
            print(f"      PyTorch sample: {pytorch_output[0, :3]}")
            print(f"      ONNX sample:    {onnx_output[0, :3]}")

print(f"\nMax absolute error across all tests: {max_abs_error:.2e}")
print(f"Max relative error across all tests: {max_rel_error:.2e}")

if max_abs_error < 1e-4 and max_rel_error < 1e-3:
    print("✓ Outputs verified — safe for production deployment")
else:
    print("✗ Outputs diverged — investigate opset or numerical issues")

Path(onnx_path).unlink()
print("\nONNX model cleaned up.")
Output
Test 1 | Shape: torch.Size([1, 10]) | Abs Error: 2.98e-06 | Rel Error: 1.45e-05
Test 2 | Shape: torch.Size([2, 10]) | Abs Error: 4.21e-06 | Rel Error: 2.11e-05
Test 3 | Shape: torch.Size([4, 10]) | Abs Error: 3.89e-06 | Rel Error: 1.98e-05

Max absolute error across all tests: 4.21e-06
Max relative error across all tests: 2.11e-05
✓ Outputs verified: safe for production deployment

ONNX model cleaned up.

What just happened?

We created a simple PyTorch model, exported it to ONNX with opset 14, then ran multiple test inputs through both the original PyTorch model and the ONNX runtime version. For each test, we computed absolute and relative errors between the outputs. The errors stayed below our thresholds (absolute 1e-4, relative 1e-3), so we cleared to deploy. The ONNX file was deleted at the end.

Common gotcha

Using float64 in PyTorch but not explicitly casting to float32 in ONNX input causes silent type mismatches. ONNX runtime will cast for you, but the precision loss happens before comparison. Always convert test inputs to the same dtype the model exports with: typically float32. Also: using `model.train()` mode during export produces a different ONNX model (dropout, batchnorm behavior changes). Always call `model.eval()` before exporting.

Error recovery

RuntimeError: Could not find an implementation for Div(13) node with name
This opset doesn't support the operator. Lower opset_version in torch.onnx.export() to 11 or 12, or switch to a CPU provider that supports opset 13+. onnxruntime is version-locked to opsets.
AssertionError from np.allclose()
Your tolerances are too tight. Start with atol=1e-4, rtol=1e-3 for float32 models. If you see divergence there, check: (1) model.eval() was called, (2) no_grad() context, (3) input dtypes match (float32), (4) opset_version is recent enough (14+ recommended). If still failing, the operator itself may have a bug: consult ONNX operator docs.
ValueError: input_name not found in onnx model
The input name in sess.run() dict doesn't match the name in torch.onnx.export(). Check input_names parameter: it must match the keys you use in sess.run().

Experienced dev note

The most expensive ONNX failure is the one you never catch. A model that silently produces 5% different outputs will break downstream ML pipelines: and you'll chase data issues for weeks. Always verify before merging. Also: relative error is more honest than absolute error for regression tasks. And batch size matters: test multiple batch sizes. ONNX Runtime has different optimizations for different batch sizes, and you might pass verification at batch=1 but fail at batch=128.

Check your understanding

If your verification passes with atol=1e-4 at batch size 1, could the same model fail at batch size 32 with a different provider (e.g., switching from CPU to CUDA)? What would you check first, and why?

Show answer hint

A correct answer recognizes that (1) batch size can expose numerical issues in batchnorm or layer norm operators, (2) different execution providers (CUDA, TensorRT) use different implementations and have different precision, and (3) you must test with the actual provider you'll use in production: CPU verification is not sufficient if you're deploying on CUDA.

VERSION PyTorch 2.11.x uses torch.onnx.export() with ONNX opset 14+ recommended. In PyTorch < 2.0, opset versions were more volatile and operator support was inconsistent. Always pin opset_version explicitly rather than relying on defaults. onnxruntime 1.16.0+ is required for opset 14 support.
NEXT

After verifying ONNX numerical correctness, the next step is profiling ONNX inference latency and memory usage to ensure the model actually runs faster than PyTorch: the export itself buys you nothing if deployment is slower.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.