How to use Modal for batch inference
Quick answer
Use the
modal Python package to define a GPU-enabled function decorated with @app.function(gpu="A10G") for batch inference. Deploy the function and call it with a list of inputs to process batches efficiently.PREREQUISITES
Python 3.8+pip install modalModal account and CLI configured (modal login)Basic knowledge of Python async or sync functions
Setup
Install the modal package and log in to your Modal account using the CLI to enable deployment and GPU usage.
pip install modal
modal login output
Successfully logged in to Modal
Step by step
Define a Modal app and a GPU-enabled function for batch inference. The example below shows a synchronous function that processes a list of prompts and returns their uppercase versions as dummy inference results.
import modal
app = modal.App("batch-inference-app")
@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
def batch_inference(prompts):
# Dummy batch inference: convert each prompt to uppercase
results = [prompt.upper() for prompt in prompts]
return results
if __name__ == "__main__":
with modal.runner.deploy_stub(app):
inputs = ["hello world", "modal batch", "inference example"]
outputs = batch_inference.remote(inputs)
print("Batch inference results:", outputs) output
Batch inference results: ['HELLO WORLD', 'MODAL BATCH', 'INFERENCE EXAMPLE']
Common variations
- Use
async deffor asynchronous batch inference functions. - Change
gpu="A10G"to other GPU types if available. - Install additional dependencies in the
image=modal.Imageparameter. - Call
batch_inference.remote()with different batch sizes or data types.
import modal
app = modal.App("async-batch-inference")
@app.function(gpu="A10G", image=modal.Image.debian_slim().pip_install("torch"))
async def async_batch_inference(prompts):
# Simulate async batch processing
import asyncio
await asyncio.sleep(1)
return [p[::-1] for p in prompts]
if __name__ == "__main__":
with modal.runner.deploy_stub(app):
import asyncio
inputs = ["abc", "def", "ghi"]
outputs = asyncio.run(async_batch_inference.remote(inputs))
print("Async batch results:", outputs) output
Async batch results: ['cba', 'fed', 'ihg']
Troubleshooting
- If you see
modal.runner.deploy_stuberrors, ensure you are logged in withmodal login. - For GPU allocation failures, verify your Modal account has GPU quota available.
- If dependencies fail to install, specify a compatible base image or pin package versions.
Key Takeaways
- Use
@app.function(gpu="A10G")to enable GPU for batch inference in Modal. - Deploy your function with
modal.runner.deploy_stub(app)before calling it remotely. - Batch inputs as lists to process multiple items efficiently in one call.
- Async functions allow non-blocking batch processing with Modal.
- Ensure Modal CLI login and GPU quota to avoid deployment errors.