How to use GPU with Modal
Quick answer
Use the
gpu parameter in the @app.function decorator in modal to specify the GPU type (e.g., gpu="A10G"). This enables your function to run on a GPU-enabled container for faster AI model inference or training.PREREQUISITES
Python 3.8+pip install modalModal account and CLI configuredGPU quota enabled on Modal platform
Setup
Install the modal Python package and configure your Modal CLI with your account credentials. Ensure you have GPU quota enabled on your Modal account to use GPU resources.
pip install modal output
Collecting modal Downloading modal-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: modal Successfully installed modal-1.x.x
Step by step
Define a Modal app and use the @app.function decorator with the gpu parameter to specify the GPU type. Inside the function, run your AI inference code. Deploy and invoke the function to utilize GPU acceleration.
import modal
app = modal.App("gpu-example")
@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def run_inference(prompt: str) -> str:
import torch
# Example: simple tensor operation on GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.tensor([1.0, 2.0, 3.0], device=device)
y = x * 2
return f"Input: {x.cpu().tolist()}, Output: {y.cpu().tolist()}, Device: {device}"
if __name__ == "__main__":
with modal.runner.deploy_stub(app):
result = run_inference.remote("Hello GPU")
print(result) output
Input: [1.0, 2.0, 3.0], Output: [2.0, 4.0, 6.0], Device: cuda
Common variations
- Use different GPU types by changing
gpu="A10G"to other supported GPUs likegpu="T4". - Install additional Python packages by chaining
pip_installcalls in theimageparameter. - Use async functions with
@app.functionfor asynchronous GPU workloads.
import modal
app = modal.App("gpu-async-example")
@app.function(gpu="T4", image=modal.Image.debian_slim().pip_install("torch"))
async def async_inference(prompt: str) -> str:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.tensor([4.0, 5.0, 6.0], device=device)
y = x + 10
return f"Async Input: {x.cpu().tolist()}, Output: {y.cpu().tolist()}, Device: {device}"
if __name__ == "__main__":
with modal.runner.deploy_stub(app):
import asyncio
result = asyncio.run(async_inference.remote("Async GPU"))
print(result) output
Async Input: [4.0, 5.0, 6.0], Output: [14.0, 15.0, 16.0], Device: cuda
Troubleshooting
- If you see
RuntimeError: CUDA not available, verify your Modal GPU quota and that thegpuparameter matches a supported GPU type. - Ensure your Docker image includes GPU drivers or use Modal's base images with GPU support.
- Check your Modal CLI is up to date and your account has GPU access enabled.
Key Takeaways
- Use the
gpuparameter in@app.functionto enable GPU in Modal functions. - Specify the GPU type string like
"A10G"or"T4"based on your quota and needs. - Include necessary Python packages in the
imageparameter for GPU workloads. - Test GPU availability inside your function with
torch.cuda.is_available(). - Troubleshoot CUDA errors by verifying Modal GPU quota and image compatibility.