Tool Beginner easy · 8 min integration

Model deployment pipeline

What you will learn

Build a complete Docker-based ML model serving pipeline with MLflow registry, versioning, and Kubernetes-ready containerization.

Why this matters

Without a deployment pipeline, your trained model is just a file on disk. A pipeline makes models reproducible, versioned, and runnable anywhere: turning experimentation into production assets. A broken pipeline means models never reach users.

Skip if: Skip the full pipeline if you're only running one-off batch predictions on a laptop or if your organization uses a managed ML platform (Hugging Face Spaces, Vertex AI) that handles containerization. You still need versioning, just not your own Docker setup.

Explanation

A model deployment pipeline orchestrates three core steps: (1) register a trained model in MLflow's model registry, (2) version it with metadata and dependencies, (3) containerize it with Docker so it runs identically in any environment. The pipeline glues together experiment tracking (MLflow), versioning (DVC), and containerization (Docker) into a single workflow that outputs a production-ready image. This matters because models trained locally must be frozen: dependencies pinned, input/output schemas locked, inference logic isolated: before they ship. Without this, 'it worked on my machine' becomes a career-limiting statement. The pipeline is the automation layer that says 'this model is ready' and moves it from research to ops.

Configuration

dockerfile

FROM python:3.10-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model_code/ .
COPY MLmodel .

ENV MLFLOW_TRACKING_URI=http://mlflow-server:5000
ENV MODEL_NAME=iris-classifier
ENV MODEL_VERSION=production

EXPOSE 8000

CMD ["mlflow", "models", "serve", "-m", "models://${MODEL_NAME}/${MODEL_VERSION}", "--host", "0.0.0.0", "--port", "8000"]

Why this order?

Base image first (immutable foundation), then system dependencies (required before Python), then Python deps (faster layer caching if only code changes), then model artifacts and metadata (loaded last so changes rebuild quickly), then environment variables (read by the CMD), finally the server entrypoint.

Wrong vs Right

Wrong way

dockerfile

FROM python:3.10
RUN pip install mlflow scikit-learn pandas numpy
COPY . /app
WORKDIR /app
CMD ["python", "serve.py"]

# Problems: (1) No explicit model version or registry reference: serves stale code, (2) system deps missing so inference fails on certain model types, (3) no health check endpoint for orchestrators, (4) working directory after deps means code changes rebuild everything, (5) no environment variable control over which model to load

Right way

docker

FROM python:3.10-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY MLmodel .
COPY model_code/ .

ENV MODEL_NAME=iris-classifier
ENV MODEL_STAGE=production

EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["mlflow", "models", "serve", "-m", "models://${MODEL_NAME}/${MODEL_STAGE}", "--host", "0.0.0.0", "--port", "8000"]

Tool vitals

Primary command

bash

mlflow models build-docker --model-uri models://<model_name>/<version> --output-path ./docker

Config file Dockerfile, docker-compose.yml, MLmodel

Verify

bash

docker run --rm -p 5000:8000 <image_name> && curl http://localhost:8000/health

Integration notes

This Dockerfile consumes outputs from MLflow experiment tracking (the trained model and its metadata) and outputs a Docker image that Kubernetes orchestrates via a Deployment. DVC stores the raw training data and model binary; MLflow registers model versions; Docker packages them; Kubernetes runs them at scale. The pipeline bridges these three tools.

Migration path

If you move to BentoML, replace the MLflow serve command with a Bento service definition and bentoml containerize. The Dockerfile structure stays the same (system deps → Python deps → model artifacts → entrypoint) but the model loading and inference logic shifts to BentoML's declarative YAML. If you use a managed platform (Vertex AI, SageMaker), export the model as a tarball and upload via their web console: Docker is unnecessary.

Cost model

MLflow is free (open-source). Docker image storage costs depend on your registry: Docker Hub (free tier: 1 private repo, rate-limited pulls); AWS ECR ($0.10/GB/month storage, $0.09/GB data transfer); GCP Artifact Registry (similar pricing). At scale (hundreds of model versions), ECR costs ~$5–20/month. The real cost is DevOps time automating the pipeline; this pays for itself in week one.

Common gotcha

MLflow's models serve command works locally but silently fails in Docker if the MLFLOW_TRACKING_URI environment variable doesn't point to a reachable server. The container starts fine, the health check passes, but invocation requests hang with 'connection refused' to the registry. Solution: hardcode the model URI in the CMD with models:/// instead of expecting the registry to be discovered. If you must use a remote registry, test the network path from inside the container first with docker exec and curl.

Team adoption

On day one: (1) establish a single MLflow tracking server (shared instance, not laptop), (2) create a template Dockerfile in the repo with the pattern above, (3) add a make deploy target that builds and tags the image with the MLflow model version, (4) require all model registrations to include a YAML config (model input schema, Python version, inference timeout). Use docker pull in CI/CD to verify the image is runnable before pushing to the registry. Set a team rule: never manually `docker run` a production model: always go through Kubernetes or a deployment script that reads from the registry.

Experienced dev note

Most teams rebuild the Docker image on every model update, bloating the registry. Instead: (1) version the Dockerfile and Python dependencies separately from the model, (2) use a model-loading script that fetches from MLflow registry at container start time, not build time. This decouples model updates (which are frequent) from Docker rebuilds (which are expensive). A single Dockerfile image can serve 50 model versions by changing the MODEL_STAGE environment variable. Bonus: this pattern makes A/B testing easy: run two containers with different stages pointing to the same image.

Check your understanding

Why would copying the trained model binary into the Dockerfile (instead of loading it from MLflow at runtime) be a mistake in a team setting?

Show answer hint

Think about what happens when a new model version is trained. Do you rebuild the entire Docker image? Or update a version string and restart the container?

Community Notes

No notes yetBe the first to share a version-specific fix or tip.