How to intermediate · 3 min read

Modal cold start optimization

Quick answer

To optimize cold start times in Modal, pre-load dependencies and models in global scope, use lightweight container images, and leverage modal.Stub with gpu or cpu resource hints. Keep your function initialization minimal and cache heavy objects outside the function handler to reduce startup latency.

PREREQUISITES

Python 3.8+
Modal account and CLI installed
Modal Python package (pip install modal)
Basic knowledge of serverless functions

Setup

Install the modal Python package and log in to your Modal account using the CLI.

Install Modal SDK: pip install modal
Login via CLI: modal login
Set up your Python environment with necessary dependencies pre-installed in your Modal image.

bash

pip install modal
modal login

output

Requirement already satisfied: modal in ...
Logged in as user@example.com

Step by step

Use modal.Stub to define your app and pre-load heavy dependencies or models at the global level to avoid reloading on every invocation. Specify resource requirements like gpu or cpu to optimize container startup. Keep your function handler lightweight.

python

import modal

stub = modal.Stub()

# Pre-load heavy dependencies globally
import time
heavy_resource = None

def load_heavy_resource():
    global heavy_resource
    time.sleep(5)  # Simulate expensive load
    heavy_resource = "Loaded"

load_heavy_resource()

@stub.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("requests"))
def fast_function(prompt: str) -> str:
    # Use pre-loaded heavy_resource without delay
    return f"Response with {heavy_resource}: {prompt}"

if __name__ == "__main__":
    with stub.run():
        print(fast_function.call("Hello Modal"))

output

Response with Loaded: Hello Modal

Common variations

For asynchronous functions, use @stub.function with async def and await. To further reduce cold start, use minimal base images and cache data in persistent volumes or external caches. You can also split large models into separate services to isolate startup costs.

python

import modal
import asyncio

stub = modal.Stub()

@stub.function(cpu=1)
async def async_fast_function(prompt: str) -> str:
    await asyncio.sleep(0.1)  # Simulate async work
    return f"Async response: {prompt}"

if __name__ == "__main__":
    with stub.run():
        result = asyncio.run(async_fast_function.call("Hello async Modal"))
        print(result)

output

Async response: Hello async Modal

Troubleshooting

If cold starts remain slow, verify your image size and dependencies; large images increase startup time.
Check that heavy initialization code is outside the function handler to avoid repeated loading.
Use Modal logs (modal logs) to diagnose startup delays.
Ensure you specify resource hints like gpu or cpu to match your workload.

✅

Key Takeaways

Pre-load heavy dependencies and models globally to avoid repeated initialization.
Use lightweight container images and specify resource hints to reduce startup latency.
Keep function handlers minimal and cache expensive objects outside the handler.
Use async functions and external caches to further optimize cold start performance.

Verified 2026-04

Verify ↗