Tool Intermediate medium · 7 min integration

Phase 1: experiment tracking

What you will learn

Initialize MLflow tracking server and configure local experiment logging to replace ad-hoc metrics files and spreadsheets.

Why this matters

Without centralized experiment tracking, ML teams lose the ability to compare model performance across runs, reproduce results, or trace which hyperparameters produced the best model. Ad-hoc logging (print statements, CSV files, Slack messages) scales only to 2-3 experiments before paralysis sets in.

Skip if: If you're running a single offline experiment on a laptop and never need to reproduce it, local <code>print()</code> debugging is sufficient. If you're sharing code with others, collaborating on tuning, or running more than 5 experiments, MLflow becomes mandatory.

Explanation

MLflow tracking is a lightweight experiment logger that captures metrics, parameters, and artifacts (models, plots) from each run and stores them in a central backend. The workflow is: (1) initialize a local or remote tracking server, (2) configure your training script to log to MLflow, (3) query runs via CLI or UI to compare results. MLflow 2.x introduced a simpler configuration model: you define a backend store (where metadata lives) and artifact store (where large files live), then point your client at the tracking URI. The server runs as a standalone process listening on a port (default 5000); your training scripts communicate via HTTP calls that are non-blocking and fail gracefully if the server is down. This separates concerns: experiments run independently while the server collects and indexes results asynchronously.

Configuration

bash

#!/bin/bash
# Step 1: Initialize MLflow backend and artifact store
mlflow server \
  --backend-store-uri sqlite:///./mlruns.db \
  --default-artifact-root ./artifacts \
  --host 0.0.0.0 \
  --port 5000 \
  &

echo "MLflow server started on http://localhost:5000"
sleep 2

# Step 2: Verify server is responding
curl -s http://localhost:5000 > /dev/null && echo "✓ Tracking server healthy" || echo "✗ Failed to connect"

# Step 3: Set environment variable so Python client auto-connects
export MLFLOW_TRACKING_URI="http://localhost:5000"

# Step 4: Create experiment container (CLI equivalent)
mlflow experiments create --experiment-name "baseline-v1"

# Step 5: Verify experiments exist
mlflow experiments search --view list-all

Why this order?

The server must start first because Python clients will attempt to connect immediately. Setting MLFLOW_TRACKING_URI before running training scripts ensures they know where to send logs. Creating the experiment explicitly allows you to organize runs by project/iteration without polluting a default namespace.

Wrong vs Right

Wrong way

bash

#!/bin/bash
# WRONG: Starting server without specifying stores
mlflow ui

# WRONG: Tracking URI hardcoded in Python scripts instead of env var
# (forces you to edit code when moving servers)

# WRONG: Using file-based SQLite on NFS without synchronization
# (causes corruption on concurrent writes)

# WRONG: Forgetting to set MLFLOW_TRACKING_URI
# (Python client creates new local mlruns/ directory, fragmenting experiments)

Right way

bash

#!/bin/bash
# RIGHT: Explicit configuration with environment variable
mlflow server \
  --backend-store-uri sqlite:///./mlruns.db \
  --default-artifact-root s3://my-bucket/artifacts \
  --host 0.0.0.0 \
  --port 5000

export MLFLOW_TRACKING_URI="http://localhost:5000"

# In Python script, no hardcoded URI needed:
# import mlflow
# mlflow.log_metric("accuracy", 0.92)
# MLflow automatically discovers MLFLOW_TRACKING_URI from env

Tool vitals

Primary command

bash

mlflow ui --backend-store-uri sqlite:///mlruns.db --default-artifact-root ./artifacts

Config file .mlflow/config

Verify

bash

curl http://localhost:5000 && echo 'Tracking server is running'

Integration notes

MLflow tracking is the upstream input to MLflow Model Registry (Phase 2). The artifacts logged here (pickled models, ONNX files) become the source of truth for production deployments. DVC (Phase 3) handles large data versioning separately; MLflow handles experiment metadata and model artifacts. When you tag a run as production-ready in MLflow, a CI/CD pipeline (Phase 4) pulls the artifact from the artifact store (S3, MinIO, or local), rebuilds it into a Docker image, and pushes to a registry: all triggered by MLflow event webhooks.

Migration path

If switching to Weights & Biases or Neptune later: both provide MLflow export APIs. Run mlflow artifacts download to pull all models locally, then use their Python SDKs to re-log to their platform. However, MLflow is stable enough (2.x is backward-compatible through 2024) that migrations are rare for teams already using it.

Cost model

MLflow server itself is free and open-source. Storage costs depend on your artifact store: local disk (free but limited to single machine), S3 (typical: $0.023 per GB/month for Standard), or managed MLflow Cloud (Databricks pricing, ~$0.40/DBU/hour for small deployments). For 100 experiments × 10MB average artifacts = 1GB, costs are negligible (~$0.02/month). Cost does not scale until you hit thousands of runs or large model artifacts (>1GB each).

Common gotcha

MLflow creates a local mlruns/ directory by default if MLFLOW_TRACKING_URI is not set or is invalid. This causes experiments to scatter across the filesystem invisibly. A training script will run, log metrics to a hidden local store, and you'll see nothing in the UI. Always verify echo $MLFLOW_TRACKING_URI before training, and check that curl succeeds on the tracking server. If you're using SQLite backend (fine for single-machine), never run MLflow server and training on the same container: they lock the database file during concurrent writes, causing hangs.

Team adoption

Day 1: Senior engineer starts the tracking server as a service (systemd unit or Docker container). Day 1: Add export MLFLOW_TRACKING_URI=... to team's shared `.bashrc` or CI config. Week 1: Add experiment naming standard (e.g., mlflow.set_experiment(f"v{model_version}")). Week 2: Set up artifact storage on S3/MinIO so experiments persist across machine restarts. Track adoption by running mlflow experiments search --view list-all | wc -l weekly: healthy adoption shows 10+ experiments in first month.

Experienced dev note

Set MLFLOW_TRACKING_URI in your shell profile or CI environment file once, not per-script. Then, always disable local MLflow artifacts when using remote storage by setting mlflow server --artifacts-only flag on a separate process if you're serving UI and storing artifacts on different infrastructure. Most importantly: use mlflow.set_experiment() in Python to namespace runs by project, not by creating separate tracking servers: one server can hold unlimited experiments, and querying across them is free.

Check your understanding

Why would logging metrics to a local mlruns/ directory instead of a server cause problems on a team, and how does setting MLFLOW_TRACKING_URI prevent this?

Show answer hint

Each script instance that doesn't know about the server creates its own local directory, fragmenting the single source of truth. The env var centralizes the endpoint so all scripts write to the same backend automatically.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.