How to beginner · 3 min read

How to use Modal for batch inference

Quick answer
Use the modal Python package to define a GPU-enabled function decorated with @app.function(gpu="A10G") for batch inference. Deploy the function and call it with a list of inputs to process batches efficiently.

PREREQUISITES

  • Python 3.8+
  • pip install modal
  • Modal account and CLI configured (modal login)
  • Basic knowledge of Python async or sync functions

Setup

Install the modal package and log in to your Modal account using the CLI to enable deployment and GPU usage.

bash
pip install modal
modal login
output
Successfully logged in to Modal

Step by step

Define a Modal app and a GPU-enabled function for batch inference. The example below shows a synchronous function that processes a list of prompts and returns their uppercase versions as dummy inference results.

python
import modal

app = modal.App("batch-inference-app")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def batch_inference(prompts):
    # Dummy batch inference: convert each prompt to uppercase
    results = [prompt.upper() for prompt in prompts]
    return results

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        inputs = ["hello world", "modal batch", "inference example"]
        outputs = batch_inference.remote(inputs)
        print("Batch inference results:", outputs)
output
Batch inference results: ['HELLO WORLD', 'MODAL BATCH', 'INFERENCE EXAMPLE']

Common variations

  • Use async def for asynchronous batch inference functions.
  • Change gpu="A10G" to other GPU types if available.
  • Install additional dependencies in the image=modal.Image parameter.
  • Call batch_inference.remote() with different batch sizes or data types.
python
import modal

app = modal.App("async-batch-inference")

@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
async def async_batch_inference(prompts):
    # Simulate async batch processing
    import asyncio
    await asyncio.sleep(1)
    return [p[::-1] for p in prompts]

if __name__ == "__main__":
    with modal.runner.deploy_stub(app):
        import asyncio
        inputs = ["abc", "def", "ghi"]
        outputs = asyncio.run(async_batch_inference.remote(inputs))
        print("Async batch results:", outputs)
output
Async batch results: ['cba', 'fed', 'ihg']

Troubleshooting

  • If you see modal.runner.deploy_stub errors, ensure you are logged in with modal login.
  • For GPU allocation failures, verify your Modal account has GPU quota available.
  • If dependencies fail to install, specify a compatible base image or pin package versions.

Key Takeaways

  • Use @app.function(gpu="A10G") to enable GPU for batch inference in Modal.
  • Deploy your function with modal.runner.deploy_stub(app) before calling it remotely.
  • Batch inputs as lists to process multiple items efficiently in one call.
  • Async functions allow non-blocking batch processing with Modal.
  • Ensure Modal CLI login and GPU quota to avoid deployment errors.
Verified 2026-04
Verify ↗