How to beginner · 3 min read

How to use GPU with Modal

Quick answer
Use the gpu parameter in the @app.function decorator in modal to specify the GPU type (e.g., gpu="A10G"). This enables your function to run on a GPU-enabled container for faster AI model inference or training.

PREREQUISITES

  • Python 3.8+
  • pip install modal
  • Modal account and CLI configured
  • GPU quota enabled on Modal platform

Setup

Install the modal Python package and configure your Modal CLI with your account credentials. Ensure you have GPU quota enabled on your Modal account to use GPU resources.

bash
pip install modal
output
Collecting modal
  Downloading modal-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: modal
Successfully installed modal-1.x.x

Step by step

Define a Modal app and use the @app.function decorator with the gpu parameter to specify the GPU type. Inside the function, run your AI inference code. Deploy and invoke the function to utilize GPU acceleration.

python
import modal

app = modal.App("gpu-example")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def run_inference(prompt: str) -> str:
    import torch
    # Example: simple tensor operation on GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    x = torch.tensor([1.0, 2.0, 3.0], device=device)
    y = x * 2
    return f"Input: {x.cpu().tolist()}, Output: {y.cpu().tolist()}, Device: {device}"

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        result = run_inference.remote("Hello GPU")
        print(result)
output
Input: [1.0, 2.0, 3.0], Output: [2.0, 4.0, 6.0], Device: cuda

Common variations

  • Use different GPU types by changing gpu="A10G" to other supported GPUs like gpu="T4".
  • Install additional Python packages by chaining pip_install calls in the image parameter.
  • Use async functions with @app.function for asynchronous GPU workloads.
python
import modal

app = modal.App("gpu-async-example")

@app.function(gpu="T4", image=modal.Image.debian_slim().pip_install("torch"))
async def async_inference(prompt: str) -> str:
    import torch
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    x = torch.tensor([4.0, 5.0, 6.0], device=device)
    y = x + 10
    return f"Async Input: {x.cpu().tolist()}, Output: {y.cpu().tolist()}, Device: {device}"

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        import asyncio
        result = asyncio.run(async_inference.remote("Async GPU"))
        print(result)
output
Async Input: [4.0, 5.0, 6.0], Output: [14.0, 15.0, 16.0], Device: cuda

Troubleshooting

  • If you see RuntimeError: CUDA not available, verify your Modal GPU quota and that the gpu parameter matches a supported GPU type.
  • Ensure your Docker image includes GPU drivers or use Modal's base images with GPU support.
  • Check your Modal CLI is up to date and your account has GPU access enabled.

Key Takeaways

  • Use the gpu parameter in @app.function to enable GPU in Modal functions.
  • Specify the GPU type string like "A10G" or "T4" based on your quota and needs.
  • Include necessary Python packages in the image parameter for GPU workloads.
  • Test GPU availability inside your function with torch.cuda.is_available().
  • Troubleshoot CUDA errors by verifying Modal GPU quota and image compatibility.
Verified 2026-04
Verify ↗