How to Intermediate · 4 min read

How to use Kubernetes for ML model deployment

Q: How to use Kubernetes for ML model deployment

Use Kubernetes to deploy ML models by containerizing your model as a Docker image, then creating Kubernetes Deployments and Services to manage scaling and expose the model API. This approach enables scalable, fault-tolerant ML serving with easy updates and monitoring.

Quick answer

Use Kubernetes to deploy ML models by containerizing your model as a Docker image, then creating Kubernetes Deployments and Services to manage scaling and expose the model API. This approach enables scalable, fault-tolerant ML serving with easy updates and monitoring.

PREREQUISITES

Docker installed
kubectl CLI configured
Access to a Kubernetes cluster (local like Minikube or cloud provider)
Basic knowledge of Docker and Kubernetes concepts

Setup Kubernetes environment

Install Docker and Kubernetes CLI tools, then set up a Kubernetes cluster. For local testing, use Minikube or Docker Desktop Kubernetes. For production, use managed clusters like GKE, EKS, or AKS.

bash

sudo apt-get update && sudo apt-get install -y docker.io
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
minikube start

output

Starting local Kubernetes cluster...
Kubectl configured to use minikube context

Step by step deployment

Containerize your ML model as a Docker image exposing a REST API (e.g., using Flask or FastAPI). Push the image to a container registry. Then create a Kubernetes Deployment manifest to run pods with your model container, and a Service manifest to expose it.

python

import os
from openai import OpenAI

# Example: Dockerfile for ML model
# FROM python:3.9-slim
# COPY . /app
# WORKDIR /app
# RUN pip install fastapi uvicorn
# CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

# Kubernetes deployment.yaml
# apiVersion: apps/v1
# kind: Deployment
# metadata:
#   name: ml-model-deployment
# spec:
#   replicas: 3
#   selector:
#     matchLabels:
#       app: ml-model
#   template:
#     metadata:
#       labels:
#         app: ml-model
#     spec:
#       containers:
#       - name: ml-model-container
#         image: your-dockerhub-username/ml-model:latest
#         ports:
#         - containerPort: 80

# Kubernetes service.yaml
# apiVersion: v1
# kind: Service
# metadata:
#   name: ml-model-service
# spec:
#   type: LoadBalancer
#   selector:
#     app: ml-model
#   ports:
#   - protocol: TCP
#     port: 80
#     targetPort: 80

# Deploy commands
# kubectl apply -f deployment.yaml
# kubectl apply -f service.yaml

# Verify pods
# kubectl get pods

# Verify service
# kubectl get svc ml-model-service

output

deployment.apps/ml-model-deployment created
service/ml-model-service created
NAME                                READY   STATUS    RESTARTS   AGE
ml-model-deployment-xxxxx-xxxxx     3/3     Running   0          1m
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
ml-model-service    LoadBalancer   10.0.0.123      35.123.45.67    80:31234/TCP   1m

Common variations

Use HorizontalPodAutoscaler to auto-scale model pods based on CPU or custom metrics.
Deploy with kubectl rollout for zero-downtime updates.
Use kubectl port-forward for local testing without exposing services externally.
Integrate with ML serving frameworks like KFServing or Seldon Core for advanced features.

Troubleshooting tips

If pods crash, check logs with kubectl logs <pod-name> to debug errors.
If service is not reachable, verify service type and external IP with kubectl get svc.
Ensure container image is accessible and correctly tagged in your registry.
Use kubectl describe pod <pod-name> to inspect pod events and resource issues.

✅

Key Takeaways

Containerize ML models and deploy them as Kubernetes Deployments for scalable serving.
Use Kubernetes Services to expose your model API securely and reliably.
Leverage autoscaling and rollout strategies for production-grade ML deployments.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗