SBOM for ML containers
Why this matters
ML containers bundle Python packages, system libraries, CUDA runtimes, and model weights: creating opaque dependency chains. Without an SBOM, you cannot audit what's in your image for security compliance, licensing violations, or vulnerability patching. In production, a supply-chain attack on a transitive dependency (e.g., a compromised data preprocessing library) goes undetected. Regulators (NIST, SOC2) now require SBOMs for containerized workloads.
Explanation
A Software Bill of Materials (SBOM) is a machine-readable inventory of all software components in a container image: OS packages, Python wheels, compiled libraries, and their versions. For ML containers, this is critical because: (1) Python's transitive dependencies create chains 50+ packages deep; (2) CUDA and cuDNN have known vulnerabilities published regularly; (3) Model weights or data preprocessing code may have GPL/AGPL licensing that contaminates your entire deployment. Tools like syft (open-source) or trivy (with SBOM mode) scan your image and produce output in standard formats (SPDX, CycloneDX). You integrate SBOM generation into your Docker build pipeline: either as a post-build step after pushing to registry, or as part of your CI/CD scanning gate. The SBOM is versioned alongside your image digest, so you can audit "what was in production on 2026-03-15" months later.
Configuration
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 as build
RUN apt-get update && apt-get install -y --no-cache-dir \
python3 python3-pip python3-dev && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 as runtime
RUN apt-get update && apt-get install -y --no-cache-dir \
python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
COPY --from=build /usr/local/lib/python3*/dist-packages /usr/local/lib/python3.10/dist-packages
COPY app/ /app
WORKDIR /app
ENTRYPOINT ["python3", "model.py"] Why this order?
Multi-stage build isolates dependencies: the build stage installs everything needed for compilation; the runtime stage contains only what's needed to run the model. This reduces final image size by 40-60% and shrinks the attack surface for SBOM scanning. The pip install happens before COPY app/ to maximize Docker layer caching: if only your code changes, rebuild is fast.
Wrong vs Right
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3 python3-pip \
&& pip install torch torchvision numpy pandas scikit-learn \
&& apt-get clean
COPY . /app
WORKDIR /app
CMD ["python3", "train.py"]
# No SBOM generated. Dependencies are baked in, opaque to scanners.
# Downstream vulnerability in torch=2.0.0 goes undetected until runtime failure. # In CI/CD after docker build:
docker build -t my-ml:latest .
docker push my-ml:latest
# Generate SBOM immediately post-push
syft ghcr.io/myorg/my-ml:latest -o spdx-json > sbom.spdx.json
syft ghcr.io/myorg/my-ml:latest -o cyclonedx-json > sbom.cyclonedx.json
# Store SBOM alongside image metadata in your artifact repository
grype sbom.spdx.json --fail-on high # Vulnerability check
# Tag image with SBOM digest for traceability
echo "SBOM SHA256: $(sha256sum sbom.spdx.json | cut -d' ' -f1)"
# Store in OCI artifact repository or commit to VCS
git add sbom.spdx.json && git commit -m "SBOM for my-ml:${GIT_SHA}"
# Alternative: use attestation (Docker BuildKit)
docker buildx build --provenance=true --sbom=true -t my-ml:latest . Tool vitals
syft <image> -o spdx-json > sbom.spdx.json Dockerfile (with multi-stage SBOM generation) sbom validate sbom.spdx.json && cat sbom.spdx.json | jq '.packages | length' Integration notes
SBOM feeds into your supply-chain security workflow: (1) Grype scans SBOM for known vulnerabilities and fails CI/CD if High/Critical are found; (2) License scanners (FOSSA, Licensefinder) parse the SBOM to detect GPL/AGPL packages that violate corporate policy; (3) Artifact attestation (Sigstore) uses SBOM as provenance evidence; (4) Kubernetes admission controllers (Kubewarden, OPA) can block image deployment if SBOM attestation is missing or stale. Connect SBOM to your supply-chain protection policy (SLSA) so that only images with verified, scanned SBOMs enter production.
Migration path
If you outgrow open-source SBOM scanning, migrate to: (1) Snyk Container for continuous monitoring with weekly rescans, (2) Anchore Enterprise for policy-as-code and runtime vulnerability correlation, (3) Aqua for runtime enforcement of SBOM policies. None of these require code changes: they consume your SPDX/CycloneDX SBOM directly. Ensure your SBOM format is standards-compliant (SPDX 2.3+, CycloneDX 1.4+) so you can swap tools without regenerating.
Cost model
Syft is free (open-source). Grype is free. If using Syft in CI/CD at scale (scanning 100+ images daily), you may hit rate limits on public image registries without authentication. Use registry credentials in your CI runner to avoid throttling. Commercial alternatives charge per image scanned: Snyk (~$0.10/scan after free tier), Anchore Enterprise (seat-based ~$500/mo for small teams). Sigstore attestations are free.
Common gotcha
Syft scans the final image layers on-registry, not your Dockerfile: it cannot see unpacked pip wheels if you used --no-cache-dir and they're not re-extracted. More critically: SBOM generation is asynchronous to your build. If you push an image and 5 minutes later generate SBOM, a race condition can occur where the image is already pulled by a CD pipeline before the SBOM exists. In production, you must generate SBOM *before* marking the image as ready for deployment. Use a signed attestation in Docker BuildKit (docker buildx build --attest sbom=true) to embed SBOM metadata immutably in the image manifest: this prevents the image and SBOM from diverging.
Team adoption
Mandate SBOM generation as a gate in your CI/CD pipeline: no image reaches your registry without a signed SBOM attestation. Create a team dashboard (e.g., Grafana + attestation API) that shows SBOM coverage: teams shipping without SBOM are immediately visible. Run weekly grype scans on your entire registry and publish a vulnerability report. Set a policy: teams must remediate High vulns within 7 days, Critical within 24 hours. For large teams, create a shared "sbom-tool" Makefile target so every team uses identical SBOM configuration: avoids drift where some teams use Syft, others use Grype, different formats, etc.
Experienced dev note
Most teams generate SBOM but never *use* it: it sits in a bucket unused. The real power is in the --fail-on medium gate in your CI/CD pipeline combined with a baseline SBOM diff. Store your baseline SBOM for main branch, then on PRs generate a new SBOM and diff it: syft diff sbom-baseline.json sbom-pr.json. This catches new vulnerabilities *before* merge, not days later. Additionally, pin your SBOM format version in your policy: teams using mixed SPDX 2.2/2.3 with CycloneDX 1.3/1.4 create parsing chaos downstream. Standardize on CycloneDX 1.4+ if you integrate with Kubernetes supply-chain security (it has better package type taxonomy).
Check your understanding
You have two images: image-v1 with torch==2.0.0 was deployed 30 days ago, image-v2 with torch==2.1.0 is in staging. A critical CVE is announced for torch==2.0.1. Your SBOM for image-v1 shows torch==2.0.0. Why is this insufficient to declare image-v1 safe, and what data point would you need from your SBOM to be certain?
Show answer hint
torch==2.0.0 may depend on older versions of CUDA runtime or numpy that have transitive CVEs. An SBOM tells you direct and transitive dependencies: check the <code>externalReferences</code> field and <code>dependencies</code> array. The real gotcha: torch wheels bundle native C++ code (cuBLAS, cuDNN): the SBOM must capture the bundled CUDA version as a separate component, not just the Python package. If your SBOM lacks CUDA component granularity, you cannot reason about native library vulns.