ONNX model deployment on edge devices
Quick answer
Use
ONNX Runtime to deploy ONNX models on edge devices for optimized inference. Convert your trained model to ONNX format, then run it with onnxruntime on the device, leveraging hardware acceleration when available.PREREQUISITES
Python 3.8+pip install onnx onnxruntime onnxruntime-toolsTrained model exported to ONNX format
Setup
Install the necessary Python packages for ONNX model deployment on edge devices. onnxruntime provides a lightweight runtime optimized for various hardware accelerators.
pip install onnx onnxruntime onnxruntime-tools Step by step
This example demonstrates loading an ONNX model and running inference on an edge device using onnxruntime. It assumes you have an ONNX model file model.onnx.
import onnxruntime as ort
import numpy as np
# Load the ONNX model
session = ort.InferenceSession("model.onnx")
# Prepare dummy input data matching model input shape and type
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
input_type = session.get_inputs()[0].type
# Example: create random input tensor (adjust shape as needed)
input_data = np.random.rand(*[dim if isinstance(dim, int) else 1 for dim in input_shape]).astype(np.float32)
# Run inference
outputs = session.run(None, {input_name: input_data})
print("Model output:", outputs[0]) output
Model output: [[...]] # numpy array output from the model
Common variations
- Use
onnxruntimewith hardware acceleration providers likeCUDAExecutionProviderorOpenVINOExecutionProviderfor better performance on supported edge devices. - Convert PyTorch or TensorFlow models to ONNX using
torch.onnx.export()ortf2onnx.convert. - Use
onnxruntime-toolsto optimize ONNX models for edge deployment.
import onnxruntime as ort
# Enable CUDA if available
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession("model.onnx", providers=providers)
# Prepare dummy input data matching model input shape and type
import numpy as np
input_shape = session.get_inputs()[0].shape
input_data = np.random.rand(*[dim if isinstance(dim, int) else 1 for dim in input_shape]).astype(np.float32)
# Run inference as before
outputs = session.run(None, {session.get_inputs()[0].name: input_data})
print("Output with CUDA:", outputs[0]) output
Output with CUDA: [[...]]
Troubleshooting
- If you get
RuntimeError: Unable to load model, verify the ONNX model file path and format. - For shape mismatch errors, confirm input tensor shapes match the model's expected input.
- If performance is slow, check if hardware acceleration providers are enabled and compatible with your device.
- Use
onnxruntime-toolsto optimize the model graph for edge devices.
Key Takeaways
- Use
onnxruntimeto run ONNX models efficiently on edge devices with hardware acceleration support. - Convert your trained models to ONNX format for compatibility and optimized inference.
- Optimize ONNX models using
onnxruntime-toolsbefore deployment to improve speed and reduce size.