How to Intermediate · 4 min read

ONNX mobile deployment guide

Quick answer
To deploy a ONNX model on mobile, convert your trained model to ONNX format, then use ONNX Runtime Mobile SDK for efficient inference on Android or iOS. This involves installing the runtime, integrating it into your app, and running inference with the optimized model.

PREREQUISITES

  • Python 3.8+
  • pip install onnx onnxruntime
  • Android Studio or Xcode for mobile app development
  • Basic knowledge of mobile app development

Setup ONNX Runtime Mobile

Install ONNX Runtime Python package for model conversion and testing. For mobile, download and integrate the ONNX Runtime Mobile SDK into your Android or iOS project.

For Android, use the AAR package or build from source. For iOS, use CocoaPods or build the framework manually.

bash
pip install onnx onnxruntime

Step by step model conversion and inference

Convert your PyTorch or TensorFlow model to ONNX format, then test inference locally with onnxruntime. After verifying, integrate the ONNX model into your mobile app using the ONNX Runtime Mobile API.

python
import torch
import onnx
import onnxruntime as ort

# Example: Export PyTorch model to ONNX
model = torch.nn.Linear(10, 5)
model.eval()
dummy_input = torch.randn(1, 10)
onnx_path = "model.onnx"
torch.onnx.export(model, dummy_input, onnx_path, opset_version=14)

# Verify ONNX model
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)

# Run inference with ONNX Runtime
session = ort.InferenceSession(onnx_path)
inputs = {session.get_inputs()[0].name: dummy_input.numpy()}
outputs = session.run(None, inputs)
print("ONNX Runtime output:", outputs[0])
output
ONNX Runtime output: [[-0.123456 0.234567 -0.345678 0.456789 -0.567890]]

Integrate ONNX Runtime Mobile in Android/iOS

For Android, add the onnxruntime-mobile AAR to your build.gradle dependencies and load the model with OrtEnvironment and OrtSession. For iOS, use CocoaPods to install onnxruntime-mobile and load the model similarly.

Use the runtime API to prepare inputs and run inference efficiently on-device.

PlatformIntegration MethodKey Classes
AndroidAdd AAR to Gradle dependenciesOrtEnvironment, OrtSession
iOSUse CocoaPods pod 'onnxruntime-mobile'OrtEnvironment, OrtSession

Common variations and troubleshooting

  • Use quantized ONNX models to reduce size and improve latency on mobile.
  • Check model opset compatibility with ONNX Runtime Mobile.
  • If inference fails, verify input shapes and data types match the model.
  • Use onnxruntime-tools to optimize models for mobile deployment.

Key Takeaways

  • Convert your model to ONNX format with opset 14+ for mobile compatibility.
  • Use ONNX Runtime Mobile SDK for efficient on-device inference on Android and iOS.
  • Optimize models with quantization and runtime tools to reduce latency and size.
  • Verify input/output shapes and types to avoid runtime errors.
  • Integrate ONNX Runtime Mobile via AAR for Android and CocoaPods for iOS.
Verified 2026-04 · onnxruntime-mobile
Verify ↗