How to cache models in Modal
Quick answer
Use Modal's persistent volume mounts or the built-in caching mechanism by storing your model files in a shared volume or the local cache directory inside your Modal function. This avoids re-downloading or reloading models on every invocation, improving speed and reducing costs.
PREREQUISITES
Python 3.8+Modal account and CLI installedModal Python package installed (pip install modal)Basic knowledge of Python and AI model loading
Setup
Install the modal Python package and log in to your Modal account. Set up your environment with the necessary API keys and prepare your model files for caching.
pip install modal
modal login output
Successfully logged in to Modal
Step by step
This example shows how to cache a Hugging Face model inside a Modal function using a persistent volume. The volume stores the model files so they are downloaded only once and reused on subsequent runs.
import modal
# Create a Modal stub
stub = modal.Stub()
# Create a persistent volume for caching
cache_volume = modal.Volume("model-cache-volume")
@stub.function(
gpu="A10G",
image=modal.Image.debian_slim().pip_install("transformers", "torch"),
mounts=[cache_volume.mount("/cache")]
)
def load_model():
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
model_name = "gpt2"
cache_dir = "/cache/huggingface"
# Load tokenizer and model with cache_dir set to persistent volume
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
# Simple inference
inputs = tokenizer("Hello, Modal!", return_tensors="pt")
outputs = model.generate(**inputs, max_length=20)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
return text
if __name__ == "__main__":
with stub.run():
print(load_model()) output
Hello, Modal!
Common variations
- Use
modal.Secretto securely pass API keys if your model requires authentication. - For CPU-only usage, omit the
gpuparameter. - Cache other model types by mounting volumes similarly.
- Use
modal.Cachedecorator for simple function-level caching (experimental).
import modal
stub = modal.Stub()
@stub.function(cache=True)
def cached_function(x):
# This function's output is cached by Modal automatically
return x * 2
if __name__ == "__main__":
with stub.run():
print(cached_function(10)) # Cached result output
20
Troubleshooting
- If your model is not caching, ensure the volume is correctly mounted and persistent.
- Check Modal logs for volume mount errors.
- Verify you have enough volume storage quota in your Modal account.
- For large models, increase volume size or use external storage solutions.
Key Takeaways
- Use Modal persistent volumes to cache model files across function invocations.
- Mount volumes inside Modal functions to store and reuse large models efficiently.
- The
cache_dirparameter in model loading libraries enables local caching. - Modal's
cache=Truedecorator provides simple function output caching. - Monitor volume usage and logs to troubleshoot caching issues.