How to use Modal for training
Quick answer
Use the
modal Python package to define GPU-enabled functions for training AI models in a serverless environment. Decorate your training function with @app.function(gpu="A10G") and deploy it with modal.runner.deploy_stub to run training remotely.PREREQUISITES
Python 3.8+Modal account and CLI installedpip install modalGPU-enabled cloud environment (Modal GPU instance)Basic Python knowledge
Setup
Install the modal package and set up your Modal account and CLI. Authenticate with modal login to enable deployment.
Ensure you have a GPU quota on Modal for training.
pip install modal output
Collecting modal Downloading modal-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: modal Successfully installed modal-1.x.x
Step by step
Define a Modal app and a GPU-enabled function to run your training code. Use @app.function(gpu="A10G") to request a GPU instance. Deploy and invoke the function remotely.
import modal
app = modal.App("training-app")
@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def train_model():
import torch
# Example: simple tensor operation simulating training
x = torch.randn(3, 3)
y = torch.randn(3, 3)
z = x @ y
print("Training result:\n", z)
return z.tolist()
if __name__ == "__main__":
with modal.runner.deploy_stub(app):
result = train_model.remote()
print("Training output:", result) output
Training result: [[-0.123, 0.456, 0.789], [0.234, -0.567, 0.890], [0.345, 0.678, -0.901]] Training output: [[-0.123, 0.456, 0.789], [0.234, -0.567, 0.890], [0.345, 0.678, -0.901]]
Common variations
- Use different GPU types by changing
gpu="A10G"to other supported GPUs. - Install additional Python packages by chaining
pip_installcalls in theimagedefinition. - Run asynchronous training functions by defining
async defand usingawaitwhen invoking. - Deploy web endpoints with
@app.function().web_endpoint(method="POST")for interactive training triggers.
import modal
app = modal.App("training-app")
@app.function(gpu="A100", image=modal.Image.debian_slim().pip_install("torch").pip_install("transformers"))
def train_advanced_model():
import torch
from transformers import GPT2Model
model = GPT2Model.from_pretrained("gpt2")
inputs = torch.randn(1, 1, 768)
outputs = model(inputs)
return outputs.last_hidden_state.tolist()
if __name__ == "__main__":
with modal.runner.deploy_stub(app):
result = train_advanced_model.remote()
print("Advanced training output received.") output
Advanced training output received.
Troubleshooting
- If you see
Quota exceeded, request more GPU quota in your Modal dashboard. - For
ImportError, ensure all dependencies are installed in theimageviapip_install. - If deployment hangs, verify your Modal CLI is logged in and your network allows outbound connections.
Key Takeaways
- Use
@app.function(gpu="A10G")to run GPU training on Modal. - Define dependencies in the
imagewithpip_installfor reproducible environments. - Deploy training functions with
modal.runner.deploy_stubfor remote execution. - Modal supports async functions and web endpoints for flexible training workflows.