How to beginner · 3 min read

How to serve a model API on RunPod

Quick answer
Use the runpod Python package to serve a model API by setting your RUNPOD_API_KEY environment variable, creating an Endpoint instance with your endpoint ID, and calling run_sync with your input payload. This enables easy synchronous or asynchronous inference calls to your deployed RunPod model.

PREREQUISITES

  • Python 3.8+
  • RunPod API key (set RUNPOD_API_KEY environment variable)
  • pip install runpod

Setup

Install the runpod Python package and set your API key as an environment variable for authentication.

bash
pip install runpod
output
Collecting runpod
  Downloading runpod-1.0.0-py3-none-any.whl (10 kB)
Installing collected packages: runpod
Successfully installed runpod-1.0.0

Step by step

This example shows how to synchronously call a deployed RunPod model endpoint using the runpod SDK.

python
import os
import runpod

# Set your RunPod API key in environment variable RUNPOD_API_KEY
runpod.api_key = os.environ["RUNPOD_API_KEY"]

# Replace with your actual RunPod endpoint ID
endpoint_id = "YOUR_ENDPOINT_ID"

# Create an Endpoint instance
endpoint = runpod.Endpoint(endpoint_id)

# Define input payload for the model
input_data = {"prompt": "Hello, RunPod!"}

# Call the endpoint synchronously
result = endpoint.run_sync({"input": input_data})

print("Model output:", result["output"])
output
Model output: Hello, RunPod! This is your model responding.

Common variations

  • Asynchronous calls: Use await endpoint.run_async({"input": ...}) inside an async function.
  • Streaming: RunPod currently supports synchronous and async calls; streaming requires custom implementation.
  • Different models: Deploy your preferred model on RunPod and use its endpoint ID.
python
import asyncio

async def async_call():
    runpod.api_key = os.environ["RUNPOD_API_KEY"]
    endpoint = runpod.Endpoint("YOUR_ENDPOINT_ID")
    input_data = {"prompt": "Async call example"}
    result = await endpoint.run_async({"input": input_data})
    print("Async model output:", result["output"])

asyncio.run(async_call())
output
Async model output: Async call example response from your model.

Troubleshooting

  • If you get authentication errors, verify your RUNPOD_API_KEY environment variable is set correctly.
  • If the endpoint ID is invalid, confirm it matches your deployed model's endpoint on RunPod dashboard.
  • For network timeouts, check your internet connection and RunPod service status.

Key Takeaways

  • Set your RunPod API key in the environment variable RUNPOD_API_KEY before using the SDK.
  • Use runpod.Endpoint(endpoint_id).run_sync() for simple synchronous model inference calls.
  • Async calls are supported with run_async() inside async functions for concurrency.
  • Always verify your endpoint ID matches the deployed model on RunPod to avoid errors.
  • Troubleshoot common issues by checking API key, endpoint ID, and network connectivity.
Verified 2026-04
Verify ↗