How to Intermediate · 4 min read

ONNX mobile deployment guide

Q: ONNX mobile deployment guide

To deploy a ONNX model on mobile, convert your trained model to ONNX format, then use ONNX Runtime Mobile SDK for efficient inference on Android or iOS. This involves installing the runtime, integrating it into your app, and running inference with the optimized model.

Quick answer

To deploy a ONNX model on mobile, convert your trained model to ONNX format, then use ONNX Runtime Mobile SDK for efficient inference on Android or iOS. This involves installing the runtime, integrating it into your app, and running inference with the optimized model.

PREREQUISITES

Python 3.8+
pip install onnx onnxruntime
Android Studio or Xcode for mobile app development
Basic knowledge of mobile app development

Setup ONNX Runtime Mobile

Install ONNX Runtime Python package for model conversion and testing. For mobile, download and integrate the ONNX Runtime Mobile SDK into your Android or iOS project.

For Android, use the AAR package or build from source. For iOS, use CocoaPods or build the framework manually.

bash

pip install onnx onnxruntime

Step by step model conversion and inference

Convert your PyTorch or TensorFlow model to ONNX format, then test inference locally with onnxruntime. After verifying, integrate the ONNX model into your mobile app using the ONNX Runtime Mobile API.

python

import torch
import onnx
import onnxruntime as ort

# Example: Export PyTorch model to ONNX
model = torch.nn.Linear(10, 5)
model.eval()
dummy_input = torch.randn(1, 10)
onnx_path = "model.onnx"
torch.onnx.export(model, dummy_input, onnx_path, opset_version=14)

# Verify ONNX model
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)

# Run inference with ONNX Runtime
session = ort.InferenceSession(onnx_path)
inputs = {session.get_inputs()[0].name: dummy_input.numpy()}
outputs = session.run(None, inputs)
print("ONNX Runtime output:", outputs[0])

output

ONNX Runtime output: [[-0.123456 0.234567 -0.345678 0.456789 -0.567890]]

Integrate ONNX Runtime Mobile in Android/iOS

For Android, add the onnxruntime-mobile AAR to your build.gradle dependencies and load the model with OrtEnvironment and OrtSession. For iOS, use CocoaPods to install onnxruntime-mobile and load the model similarly.

Use the runtime API to prepare inputs and run inference efficiently on-device.

Platform	Integration Method	Key Classes
Android	Add AAR to Gradle dependencies	OrtEnvironment, OrtSession
iOS	Use CocoaPods pod 'onnxruntime-mobile'	OrtEnvironment, OrtSession

Common variations and troubleshooting

Use quantized ONNX models to reduce size and improve latency on mobile.
Check model opset compatibility with ONNX Runtime Mobile.
If inference fails, verify input shapes and data types match the model.
Use onnxruntime-tools to optimize models for mobile deployment.

✅

Key Takeaways

Convert your model to ONNX format with opset 14+ for mobile compatibility.
Use ONNX Runtime Mobile SDK for efficient on-device inference on Android and iOS.
Optimize models with quantization and runtime tools to reduce latency and size.
Verify input/output shapes and types to avoid runtime errors.
Integrate ONNX Runtime Mobile via AAR for Android and CocoaPods for iOS.

Verified 2026-04 · onnxruntime-mobile

Verify ↗