How to beginner · 3 min read

How to use Modal for training

Quick answer
Use the modal Python package to define GPU-enabled functions for training AI models in a serverless environment. Decorate your training function with @app.function(gpu="A10G") and deploy it with modal.runner.deploy_stub to run training remotely.

PREREQUISITES

  • Python 3.8+
  • Modal account and CLI installed
  • pip install modal
  • GPU-enabled cloud environment (Modal GPU instance)
  • Basic Python knowledge

Setup

Install the modal package and set up your Modal account and CLI. Authenticate with modal login to enable deployment.

Ensure you have a GPU quota on Modal for training.

bash
pip install modal
output
Collecting modal
  Downloading modal-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: modal
Successfully installed modal-1.x.x

Step by step

Define a Modal app and a GPU-enabled function to run your training code. Use @app.function(gpu="A10G") to request a GPU instance. Deploy and invoke the function remotely.

python
import modal

app = modal.App("training-app")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def train_model():
    import torch
    # Example: simple tensor operation simulating training
    x = torch.randn(3, 3)
    y = torch.randn(3, 3)
    z = x @ y
    print("Training result:\n", z)
    return z.tolist()

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        result = train_model.remote()
        print("Training output:", result)
output
Training result:
 [[-0.123, 0.456, 0.789], [0.234, -0.567, 0.890], [0.345, 0.678, -0.901]]
Training output: [[-0.123, 0.456, 0.789], [0.234, -0.567, 0.890], [0.345, 0.678, -0.901]]

Common variations

  • Use different GPU types by changing gpu="A10G" to other supported GPUs.
  • Install additional Python packages by chaining pip_install calls in the image definition.
  • Run asynchronous training functions by defining async def and using await when invoking.
  • Deploy web endpoints with @app.function().web_endpoint(method="POST") for interactive training triggers.
python
import modal

app = modal.App("training-app")

@app.function(gpu="A100", image=modal.Image.debian_slim().pip_install("torch").pip_install("transformers"))
def train_advanced_model():
    import torch
    from transformers import GPT2Model
    model = GPT2Model.from_pretrained("gpt2")
    inputs = torch.randn(1, 1, 768)
    outputs = model(inputs)
    return outputs.last_hidden_state.tolist()

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        result = train_advanced_model.remote()
        print("Advanced training output received.")
output
Advanced training output received.

Troubleshooting

  • If you see Quota exceeded, request more GPU quota in your Modal dashboard.
  • For ImportError, ensure all dependencies are installed in the image via pip_install.
  • If deployment hangs, verify your Modal CLI is logged in and your network allows outbound connections.

Key Takeaways

  • Use @app.function(gpu="A10G") to run GPU training on Modal.
  • Define dependencies in the image with pip_install for reproducible environments.
  • Deploy training functions with modal.runner.deploy_stub for remote execution.
  • Modal supports async functions and web endpoints for flexible training workflows.
Verified 2026-04
Verify ↗