CacheMissError
dvc.exceptions.CacheMissError
Stack trace
dvc.exceptions.CacheMissError: Cache miss for dependency 'data/raw/data.csv' in stage 'prepare'
File "/usr/local/lib/python3.9/site-packages/dvc/stage/__init__.py", line 123, in run
raise CacheMissError(f"Cache miss for dependency '{dep}' in stage '{self.name}'")
CacheMissError: Cache miss for dependency 'data/raw/data.csv' in stage 'prepare' Why it happens
DVC tracks dependencies and outputs via hashes to reuse cached results. If a dependency file is modified, deleted, or missing, DVC cannot find a matching cache entry and raises a CacheMissError. This ensures pipeline correctness by rerunning affected stages.
Detection
Monitor DVC pipeline runs for CacheMissError exceptions and log which dependencies caused cache misses to identify changed or missing files before pipeline failure.
Causes & fixes
Dependency file was modified after last pipeline run, changing its hash.
Revert the dependency file to its original state or accept the change and allow DVC to rerun the pipeline to update the cache.
Dependency file was deleted or moved, so DVC cannot find it.
Restore the missing dependency file to its expected path or update the DVC stage to point to the new location.
DVC cache directory was cleared or corrupted, losing cached outputs.
Restore the DVC cache directory from backup or rerun the pipeline stages to regenerate cache entries.
Pipeline stage definition changed dependencies without updating DVC files.
Run 'dvc repro' to update pipeline stages and cache, or manually update DVC files to reflect new dependencies.
Code: broken vs fixed
import dvc.api
# This will raise CacheMissError if dependency changed or missing
with dvc.api.open('data/raw/data.csv', 'r') as fd:
data = fd.read() # Cache miss error triggered here import os
import dvc.api
os.environ['DVC_CACHE_DIR'] = '/path/to/dvc/cache' # Ensure cache dir is set
# Wrap in try/except to handle cache miss
try:
with dvc.api.open('data/raw/data.csv', 'r') as fd:
data = fd.read() # Fixed: handle cache miss gracefully
except dvc.exceptions.CacheMissError as e:
print(f"Cache miss detected: {e}")
# Optionally trigger pipeline repro here
import subprocess
subprocess.run(['dvc', 'repro'])
with dvc.api.open('data/raw/data.csv', 'r') as fd:
data = fd.read()
print(data) Workaround
Catch CacheMissError, then manually run 'dvc repro' to rebuild missing cache dependencies before continuing pipeline execution.
Prevention
Use DVC's 'dvc status' and 'dvc diff' commands regularly to detect dependency changes early and commit consistent pipeline states to avoid unexpected cache misses.