How to beginner · 3 min read

How to use Modal volumes for model weights

Quick answer
Use Modal volumes to persist large model weights by mounting a shared volume inside your @app.function in Modal. Upload your model weights once to the volume, then access them in your GPU-enabled functions without re-downloading. This enables fast, persistent storage for model files across runs.

PREREQUISITES

  • Python 3.8+
  • Modal account and CLI installed
  • pip install modal
  • Basic knowledge of Modal functions and volumes

Setup

Install the modal Python package and log in to your Modal account. Create a volume to store your model weights persistently. This volume can be reused across multiple function invocations.

bash
pip install modal
modal login
output
Successfully logged in to Modal

Step by step

This example shows how to create a Modal volume, upload model weights to it, and then mount it inside a GPU-enabled Modal function to load the model without re-downloading.

python
import modal
import os

# Create or get a persistent volume for model weights
volume = modal.Volume("model-weights-volume")

app = modal.App()

# Function to upload model weights once
@app.function()
def upload_weights():
    # Simulate uploading weights to the volume
    with volume.mounted() as path:
        weights_path = os.path.join(path, "weights.bin")
        with open(weights_path, "wb") as f:
            f.write(b"fake model weights data")
        print(f"Weights uploaded to {weights_path}")

# GPU function that mounts the volume to load weights
@app.function(gpu="A10G", shared_volumes={"/weights": volume})
def run_model():
    weights_path = "/weights/weights.bin"
    with open(weights_path, "rb") as f:
        data = f.read()
    print(f"Loaded weights of size {len(data)} bytes")
    # Here you would load your ML model with these weights
    return "Model run complete"

if __name__ == "__main__":
    # Upload weights once
    upload_weights.call()
    # Run model function
    result = run_model.call()
    print(result)
output
Weights uploaded to /model-weights-volume/weights.bin
Loaded weights of size 22 bytes
Model run complete

Common variations

  • Use @app.function(gpu="A10G") to run on GPU instances.
  • Mount multiple volumes by adding more entries to shared_volumes.
  • Use volume.copy_from_local("local_path") to upload weights from local files.
  • For async functions, use await function.async_call().

Troubleshooting

  • If you see FileNotFoundError, ensure the volume is correctly mounted and the weights file exists.
  • Check your Modal CLI login status with modal whoami.
  • Ensure your GPU quota is sufficient for gpu="A10G" usage.

Key Takeaways

  • Use Modal volumes to persist and share large model weights efficiently across function runs.
  • Mount volumes inside GPU-enabled Modal functions with the shared_volumes parameter.
  • Upload weights once to the volume to avoid repeated downloads and speed up inference.
  • Modal volumes simplify managing model files in serverless GPU environments.
  • Always verify volume mounting paths and Modal CLI authentication to avoid runtime errors.
Verified 2026-04
Verify ↗