How to use ONNX Runtime GPU
Quick answer
Use
onnxruntime with the CUDAExecutionProvider to run ONNX models on GPU. Install onnxruntime-gpu, load your model with InferenceSession specifying providers=['CUDAExecutionProvider'], and run inference for accelerated performance.PREREQUISITES
Python 3.8+NVIDIA GPU with CUDA 11.1 or higherpip install onnxruntime-gpu
Setup
Install the GPU-enabled ONNX Runtime package and verify your CUDA environment is properly configured.
pip install onnxruntime-gpu Step by step
This example loads an ONNX model and runs inference on GPU using onnxruntime.
import onnxruntime as ort
import numpy as np
# Load ONNX model with CUDA execution provider
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
# Prepare dummy input matching model input shape and type
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
input_type = session.get_inputs()[0].type
# Example: create random input tensor (float32)
input_data = np.random.randn(*[dim if isinstance(dim, int) else 1 for dim in input_shape]).astype(np.float32)
# Run inference
outputs = session.run(None, {input_name: input_data})
print("Output shape:", [output.shape for output in outputs]) output
Output shape: [(1, 1000)]
Common variations
- Use
providers=["CPUExecutionProvider"]to run on CPU instead. - For async inference, use
session.run_async()(available in recent versions). - Specify multiple providers to fallback if GPU is unavailable.
import onnxruntime as ort
import numpy as np
# Fallback to CPU if GPU unavailable
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
# Async inference example (Python 3.8+)
import asyncio
async def async_infer():
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
outputs = await session.run_async(None, {input_name: input_data})
print("Async output shape:", [output.shape for output in outputs])
asyncio.run(async_infer()) output
Async output shape: [(1, 1000)]
Troubleshooting
- If you get
CUDA error, verify your NVIDIA driver and CUDA toolkit versions match ONNX Runtime GPU requirements. - Ensure your GPU supports CUDA 11.1 or higher.
- Use
ort.get_available_providers()to check ifCUDAExecutionProvideris available. - If
CUDAExecutionProvideris missing, reinstallonnxruntime-gpuand check your CUDA installation.
import onnxruntime as ort
print("Available providers:", ort.get_available_providers()) output
Available providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
Key Takeaways
- Install
onnxruntime-gputo enable GPU acceleration for ONNX models. - Specify
providers=['CUDAExecutionProvider']when creatingInferenceSessionto run on GPU. - Verify CUDA and NVIDIA driver compatibility if GPU execution fails.
- Use
ort.get_available_providers()to confirm GPU provider availability. - Async inference and provider fallback improve flexibility in deployment.