How to beginner · 3 min read

How to use GPU with Modal

Q: How to use GPU with Modal

Use the gpu parameter in the @app.function decorator in modal to specify the GPU type (e.g., gpu="A10G"). This enables your function to run on a GPU-enabled container for faster AI model inference or training.

Quick answer

Use the gpu parameter in the @app.function decorator in modal to specify the GPU type (e.g., gpu="A10G"). This enables your function to run on a GPU-enabled container for faster AI model inference or training.

PREREQUISITES

Python 3.8+
pip install modal
Modal account and CLI configured
GPU quota enabled on Modal platform

Setup

Install the modal Python package and configure your Modal CLI with your account credentials. Ensure you have GPU quota enabled on your Modal account to use GPU resources.

bash

pip install modal

output

Collecting modal
  Downloading modal-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: modal
Successfully installed modal-1.x.x

Step by step

Define a Modal app and use the @app.function decorator with the gpu parameter to specify the GPU type. Inside the function, run your AI inference code. Deploy and invoke the function to utilize GPU acceleration.

python

import modal

app = modal.App("gpu-example")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def run_inference(prompt: str) -> str:
    import torch
    # Example: simple tensor operation on GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    x = torch.tensor([1.0, 2.0, 3.0], device=device)
    y = x * 2
    return f"Input: {x.cpu().tolist()}, Output: {y.cpu().tolist()}, Device: {device}"

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        result = run_inference.remote("Hello GPU")
        print(result)

output

Input: [1.0, 2.0, 3.0], Output: [2.0, 4.0, 6.0], Device: cuda

Common variations

Use different GPU types by changing gpu="A10G" to other supported GPUs like gpu="T4".
Install additional Python packages by chaining pip_install calls in the image parameter.
Use async functions with @app.function for asynchronous GPU workloads.

python

import modal

app = modal.App("gpu-async-example")

@app.function(gpu="T4", image=modal.Image.debian_slim().pip_install("torch"))
async def async_inference(prompt: str) -> str:
    import torch
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    x = torch.tensor([4.0, 5.0, 6.0], device=device)
    y = x + 10
    return f"Async Input: {x.cpu().tolist()}, Output: {y.cpu().tolist()}, Device: {device}"

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        import asyncio
        result = asyncio.run(async_inference.remote("Async GPU"))
        print(result)

output

Async Input: [4.0, 5.0, 6.0], Output: [14.0, 15.0, 16.0], Device: cuda

Troubleshooting

If you see RuntimeError: CUDA not available, verify your Modal GPU quota and that the gpu parameter matches a supported GPU type.
Ensure your Docker image includes GPU drivers or use Modal's base images with GPU support.
Check your Modal CLI is up to date and your account has GPU access enabled.

✅

Key Takeaways

Use the gpu parameter in @app.function to enable GPU in Modal functions.
Specify the GPU type string like "A10G" or "T4" based on your quota and needs.
Include necessary Python packages in the image parameter for GPU workloads.
Test GPU availability inside your function with torch.cuda.is_available().
Troubleshoot CUDA errors by verifying Modal GPU quota and image compatibility.

Verified 2026-04

Verify ↗