How to beginner · 3 min read

How to use Modal volumes for model weights

Q: How to use Modal volumes for model weights

Use Modal volumes to persist large model weights by mounting a shared volume inside your @app.function in Modal. Upload your model weights once to the volume, then access them in your GPU-enabled functions without re-downloading. This enables fast, persistent storage for model files across runs.

Quick answer

Use Modal volumes to persist large model weights by mounting a shared volume inside your @app.function in Modal. Upload your model weights once to the volume, then access them in your GPU-enabled functions without re-downloading. This enables fast, persistent storage for model files across runs.

PREREQUISITES

Python 3.8+
Modal account and CLI installed
pip install modal
Basic knowledge of Modal functions and volumes

Setup

Install the modal Python package and log in to your Modal account. Create a volume to store your model weights persistently. This volume can be reused across multiple function invocations.

bash

pip install modal
modal login

output

Successfully logged in to Modal

Step by step

This example shows how to create a Modal volume, upload model weights to it, and then mount it inside a GPU-enabled Modal function to load the model without re-downloading.

python

import modal
import os

# Create or get a persistent volume for model weights
volume = modal.Volume("model-weights-volume")

app = modal.App()

# Function to upload model weights once
@app.function()
def upload_weights():
    # Simulate uploading weights to the volume
    with volume.mounted() as path:
        weights_path = os.path.join(path, "weights.bin")
        with open(weights_path, "wb") as f:
            f.write(b"fake model weights data")
        print(f"Weights uploaded to {weights_path}")

# GPU function that mounts the volume to load weights
@app.function(gpu="A10G", shared_volumes={"/weights": volume})
def run_model():
    weights_path = "/weights/weights.bin"
    with open(weights_path, "rb") as f:
        data = f.read()
    print(f"Loaded weights of size {len(data)} bytes")
    # Here you would load your ML model with these weights
    return "Model run complete"

if __name__ == "__main__":
    # Upload weights once
    upload_weights.call()
    # Run model function
    result = run_model.call()
    print(result)

output

Weights uploaded to /model-weights-volume/weights.bin
Loaded weights of size 22 bytes
Model run complete

Common variations

Use @app.function(gpu="A10G") to run on GPU instances.
Mount multiple volumes by adding more entries to shared_volumes.
Use volume.copy_from_local("local_path") to upload weights from local files.
For async functions, use await function.async_call().

Troubleshooting

If you see FileNotFoundError, ensure the volume is correctly mounted and the weights file exists.
Check your Modal CLI login status with modal whoami.
Ensure your GPU quota is sufficient for gpu="A10G" usage.

✅

Key Takeaways

Use Modal volumes to persist and share large model weights efficiently across function runs.
Mount volumes inside GPU-enabled Modal functions with the shared_volumes parameter.
Upload weights once to the volume to avoid repeated downloads and speed up inference.
Modal volumes simplify managing model files in serverless GPU environments.
Always verify volume mounting paths and Modal CLI authentication to avoid runtime errors.

Verified 2026-04

Verify ↗