How to beginner · 3 min read

How to serve a model API on RunPod

Q: How to serve a model API on RunPod

Use the runpod Python package to serve a model API by setting your RUNPOD_API_KEY environment variable, creating an Endpoint instance with your endpoint ID, and calling run_sync with your input payload. This enables easy synchronous or asynchronous inference calls to your deployed RunPod model.

Quick answer

Use the runpod Python package to serve a model API by setting your RUNPOD_API_KEY environment variable, creating an Endpoint instance with your endpoint ID, and calling run_sync with your input payload. This enables easy synchronous or asynchronous inference calls to your deployed RunPod model.

PREREQUISITES

Python 3.8+
RunPod API key (set RUNPOD_API_KEY environment variable)
pip install runpod

Setup

Install the runpod Python package and set your API key as an environment variable for authentication.

bash

pip install runpod

output

Collecting runpod
  Downloading runpod-1.0.0-py3-none-any.whl (10 kB)
Installing collected packages: runpod
Successfully installed runpod-1.0.0

Step by step

This example shows how to synchronously call a deployed RunPod model endpoint using the runpod SDK.

python

import os
import runpod

# Set your RunPod API key in environment variable RUNPOD_API_KEY
runpod.api_key = os.environ["RUNPOD_API_KEY"]

# Replace with your actual RunPod endpoint ID
endpoint_id = "YOUR_ENDPOINT_ID"

# Create an Endpoint instance
endpoint = runpod.Endpoint(endpoint_id)

# Define input payload for the model
input_data = {"prompt": "Hello, RunPod!"}

# Call the endpoint synchronously
result = endpoint.run_sync({"input": input_data})

print("Model output:", result["output"])

output

Model output: Hello, RunPod! This is your model responding.

Common variations

Asynchronous calls: Use await endpoint.run_async({"input": ...}) inside an async function.
Streaming: RunPod currently supports synchronous and async calls; streaming requires custom implementation.
Different models: Deploy your preferred model on RunPod and use its endpoint ID.

python

import asyncio

async def async_call():
    runpod.api_key = os.environ["RUNPOD_API_KEY"]
    endpoint = runpod.Endpoint("YOUR_ENDPOINT_ID")
    input_data = {"prompt": "Async call example"}
    result = await endpoint.run_async({"input": input_data})
    print("Async model output:", result["output"])

asyncio.run(async_call())

output

Async model output: Async call example response from your model.

Troubleshooting

If you get authentication errors, verify your RUNPOD_API_KEY environment variable is set correctly.
If the endpoint ID is invalid, confirm it matches your deployed model's endpoint on RunPod dashboard.
For network timeouts, check your internet connection and RunPod service status.

✅

Key Takeaways

Set your RunPod API key in the environment variable RUNPOD_API_KEY before using the SDK.
Use runpod.Endpoint(endpoint_id).run_sync() for simple synchronous model inference calls.
Async calls are supported with run_async() inside async functions for concurrency.
Always verify your endpoint ID matches the deployed model on RunPod to avoid errors.
Troubleshoot common issues by checking API key, endpoint ID, and network connectivity.

Verified 2026-04

Verify ↗