How to beginner · 3 min read

How to export Hugging Face model to ONNX

Quick answer
Use the transformers library's export utilities or onnxruntime tools to convert a Hugging Face model to ONNX format. The process involves loading the model and tokenizer, then exporting the model graph with transformers.onnx or torch.onnx.export for PyTorch models.

PREREQUISITES

  • Python 3.8+
  • pip install transformers>=4.30.0
  • pip install onnxruntime
  • pip install torch (for PyTorch models)

Setup

Install the required packages to export Hugging Face models to ONNX. You need transformers for model loading and export utilities, onnxruntime for ONNX model validation and inference, and torch if exporting a PyTorch model.

bash
pip install transformers onnxruntime torch

Step by step

This example exports the bert-base-uncased model to ONNX using transformers.onnx utilities. It loads the model and tokenizer, then exports the model graph to bert-base-uncased.onnx.

python
from transformers import AutoModel, AutoTokenizer
from transformers.onnx import export
from pathlib import Path
import torch

model_name = "bert-base-uncased"
output_path = Path("./bert-base-uncased.onnx")

# Load pretrained model and tokenizer
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prepare dummy input for export
inputs = tokenizer("This is a sample input for ONNX export.", return_tensors="pt")

# Export the model to ONNX
export(
    preprocessor=tokenizer,
    model=model,
    output=output_path,
    opset=13,
    tokenizer=tokenizer,
    input=inputs
)

print(f"Model exported to {output_path}")
output
Model exported to ./bert-base-uncased.onnx

Common variations

You can export other Hugging Face models by changing the model_name. For causal language models, use AutoModelForCausalLM. For TensorFlow models, use tf2onnx or the TensorFlow ONNX exporter. You can also export asynchronously or with different opset versions.

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.onnx import export
from pathlib import Path

model_name = "gpt2"
output_path = Path("./gpt2.onnx")

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer("Hello ONNX", return_tensors="pt")

export(
    preprocessor=tokenizer,
    model=model,
    output=output_path,
    opset=13,
    tokenizer=tokenizer,
    input=inputs
)

print(f"GPT-2 model exported to {output_path}")
output
GPT-2 model exported to ./gpt2.onnx

Troubleshooting

  • If you get errors about unsupported ops, try increasing the opset version (e.g., to 14 or 15).
  • If the export fails due to dynamic axes, specify dynamic_axes in torch.onnx.export.
  • Ensure your PyTorch and transformers versions are compatible and up to date.
  • Use onnxruntime.InferenceSession to validate the exported ONNX model.
python
import onnxruntime as ort

session = ort.InferenceSession("bert-base-uncased.onnx")
print("ONNX model loaded successfully.")
output
ONNX model loaded successfully.

Key Takeaways

  • Use transformers.onnx.export for straightforward Hugging Face model export to ONNX.
  • Prepare dummy inputs with the tokenizer for accurate model graph export.
  • Validate exported ONNX models with onnxruntime to ensure correctness.
  • Adjust opset versions and dynamic axes to fix export issues.
  • Switch model classes (e.g., AutoModelForCausalLM) depending on your model type.
Verified 2026-04 · bert-base-uncased, gpt2
Verify ↗