How to Intermediate · 4 min read

How to use Kubernetes for ML model deployment

Quick answer
Use Kubernetes to deploy ML models by containerizing your model as a Docker image, then creating Kubernetes Deployments and Services to manage scaling and expose the model API. This approach enables scalable, fault-tolerant ML serving with easy updates and monitoring.

PREREQUISITES

  • Docker installed
  • kubectl CLI configured
  • Access to a Kubernetes cluster (local like Minikube or cloud provider)
  • Basic knowledge of Docker and Kubernetes concepts

Setup Kubernetes environment

Install Docker and Kubernetes CLI tools, then set up a Kubernetes cluster. For local testing, use Minikube or Docker Desktop Kubernetes. For production, use managed clusters like GKE, EKS, or AKS.

bash
sudo apt-get update && sudo apt-get install -y docker.io
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
minikube start
output
Starting local Kubernetes cluster...
Kubectl configured to use minikube context

Step by step deployment

Containerize your ML model as a Docker image exposing a REST API (e.g., using Flask or FastAPI). Push the image to a container registry. Then create a Kubernetes Deployment manifest to run pods with your model container, and a Service manifest to expose it.

python
import os
from openai import OpenAI

# Example: Dockerfile for ML model
# FROM python:3.9-slim
# COPY . /app
# WORKDIR /app
# RUN pip install fastapi uvicorn
# CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

# Kubernetes deployment.yaml
# apiVersion: apps/v1
# kind: Deployment
# metadata:
#   name: ml-model-deployment
# spec:
#   replicas: 3
#   selector:
#     matchLabels:
#       app: ml-model
#   template:
#     metadata:
#       labels:
#         app: ml-model
#     spec:
#       containers:
#       - name: ml-model-container
#         image: your-dockerhub-username/ml-model:latest
#         ports:
#         - containerPort: 80

# Kubernetes service.yaml
# apiVersion: v1
# kind: Service
# metadata:
#   name: ml-model-service
# spec:
#   type: LoadBalancer
#   selector:
#     app: ml-model
#   ports:
#   - protocol: TCP
#     port: 80
#     targetPort: 80

# Deploy commands
# kubectl apply -f deployment.yaml
# kubectl apply -f service.yaml

# Verify pods
# kubectl get pods

# Verify service
# kubectl get svc ml-model-service
output
deployment.apps/ml-model-deployment created
service/ml-model-service created
NAME                                READY   STATUS    RESTARTS   AGE
ml-model-deployment-xxxxx-xxxxx     3/3     Running   0          1m
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
ml-model-service    LoadBalancer   10.0.0.123      35.123.45.67    80:31234/TCP   1m

Common variations

  • Use HorizontalPodAutoscaler to auto-scale model pods based on CPU or custom metrics.
  • Deploy with kubectl rollout for zero-downtime updates.
  • Use kubectl port-forward for local testing without exposing services externally.
  • Integrate with ML serving frameworks like KFServing or Seldon Core for advanced features.

Troubleshooting tips

  • If pods crash, check logs with kubectl logs <pod-name> to debug errors.
  • If service is not reachable, verify service type and external IP with kubectl get svc.
  • Ensure container image is accessible and correctly tagged in your registry.
  • Use kubectl describe pod <pod-name> to inspect pod events and resource issues.

Key Takeaways

  • Containerize ML models and deploy them as Kubernetes Deployments for scalable serving.
  • Use Kubernetes Services to expose your model API securely and reliably.
  • Leverage autoscaling and rollout strategies for production-grade ML deployments.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗