How to intermediate · 3 min read

Modal cold start optimization

Quick answer
To optimize cold start times in Modal, pre-load dependencies and models in global scope, use lightweight container images, and leverage modal.Stub with gpu or cpu resource hints. Keep your function initialization minimal and cache heavy objects outside the function handler to reduce startup latency.

PREREQUISITES

  • Python 3.8+
  • Modal account and CLI installed
  • Modal Python package (pip install modal)
  • Basic knowledge of serverless functions

Setup

Install the modal Python package and log in to your Modal account using the CLI.

  • Install Modal SDK: pip install modal
  • Login via CLI: modal login
  • Set up your Python environment with necessary dependencies pre-installed in your Modal image.
bash
pip install modal
modal login
output
Requirement already satisfied: modal in ...
Logged in as user@example.com

Step by step

Use modal.Stub to define your app and pre-load heavy dependencies or models at the global level to avoid reloading on every invocation. Specify resource requirements like gpu or cpu to optimize container startup. Keep your function handler lightweight.

python
import modal

stub = modal.Stub()

# Pre-load heavy dependencies globally
import time
heavy_resource = None

def load_heavy_resource():
    global heavy_resource
    time.sleep(5)  # Simulate expensive load
    heavy_resource = "Loaded"

load_heavy_resource()

@stub.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("requests"))
def fast_function(prompt: str) -> str:
    # Use pre-loaded heavy_resource without delay
    return f"Response with {heavy_resource}: {prompt}"

if __name__ == "__main__":
    with stub.run():
        print(fast_function.call("Hello Modal"))
output
Response with Loaded: Hello Modal

Common variations

For asynchronous functions, use @stub.function with async def and await. To further reduce cold start, use minimal base images and cache data in persistent volumes or external caches. You can also split large models into separate services to isolate startup costs.

python
import modal
import asyncio

stub = modal.Stub()

@stub.function(cpu=1)
async def async_fast_function(prompt: str) -> str:
    await asyncio.sleep(0.1)  # Simulate async work
    return f"Async response: {prompt}"

if __name__ == "__main__":
    with stub.run():
        result = asyncio.run(async_fast_function.call("Hello async Modal"))
        print(result)
output
Async response: Hello async Modal

Troubleshooting

  • If cold starts remain slow, verify your image size and dependencies; large images increase startup time.
  • Check that heavy initialization code is outside the function handler to avoid repeated loading.
  • Use Modal logs (modal logs) to diagnose startup delays.
  • Ensure you specify resource hints like gpu or cpu to match your workload.

Key Takeaways

  • Pre-load heavy dependencies and models globally to avoid repeated initialization.
  • Use lightweight container images and specify resource hints to reduce startup latency.
  • Keep function handlers minimal and cache expensive objects outside the handler.
  • Use async functions and external caches to further optimize cold start performance.
Verified 2026-04
Verify ↗