How to beginner · 3 min read

How to use Modal for batch inference

Q: How to use Modal for batch inference

Use the modal Python package to define a GPU-enabled function decorated with @app.function(gpu="A10G") for batch inference. Deploy the function and call it with a list of inputs to process batches efficiently.

Quick answer

Use the modal Python package to define a GPU-enabled function decorated with @app.function(gpu="A10G") for batch inference. Deploy the function and call it with a list of inputs to process batches efficiently.

PREREQUISITES

Python 3.8+
pip install modal
Modal account and CLI configured (modal login)
Basic knowledge of Python async or sync functions

Setup

Install the modal package and log in to your Modal account using the CLI to enable deployment and GPU usage.

bash

pip install modal
modal login

output

Successfully logged in to Modal

Step by step

Define a Modal app and a GPU-enabled function for batch inference. The example below shows a synchronous function that processes a list of prompts and returns their uppercase versions as dummy inference results.

python

import modal

app = modal.App("batch-inference-app")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def batch_inference(prompts):
    # Dummy batch inference: convert each prompt to uppercase
    results = [prompt.upper() for prompt in prompts]
    return results

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        inputs = ["hello world", "modal batch", "inference example"]
        outputs = batch_inference.remote(inputs)
        print("Batch inference results:", outputs)

output

Batch inference results: ['HELLO WORLD', 'MODAL BATCH', 'INFERENCE EXAMPLE']

Common variations

Use async def for asynchronous batch inference functions.
Change gpu="A10G" to other GPU types if available.
Install additional dependencies in the image=modal.Image parameter.
Call batch_inference.remote() with different batch sizes or data types.

python

import modal

app = modal.App("async-batch-inference")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
async def async_batch_inference(prompts):
    # Simulate async batch processing
    import asyncio
    await asyncio.sleep(1)
    return [p[::-1] for p in prompts]

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        import asyncio
        inputs = ["abc", "def", "ghi"]
        outputs = asyncio.run(async_batch_inference.remote(inputs))
        print("Async batch results:", outputs)

output

Async batch results: ['cba', 'fed', 'ihg']

Troubleshooting

If you see modal.runner.deploy_stub errors, ensure you are logged in with modal login.
For GPU allocation failures, verify your Modal account has GPU quota available.
If dependencies fail to install, specify a compatible base image or pin package versions.

✅

Key Takeaways

Use @app.function(gpu="A10G") to enable GPU for batch inference in Modal.
Deploy your function with modal.runner.deploy_stub(app) before calling it remotely.
Batch inputs as lists to process multiple items efficiently in one call.
Async functions allow non-blocking batch processing with Modal.
Ensure Modal CLI login and GPU quota to avoid deployment errors.

Verified 2026-04

Verify ↗