Code Beginner easy · 4 min

Cache directory: where models are stored

What you will learn

Hugging Face automatically downloads and caches model files to a local directory on your machine so you don't re-download them every time your code runs.

Why this matters

Models can be hundreds of MB to tens of GB: without caching, every script restart would waste bandwidth and time. Understanding where models live helps you manage disk space, troubleshoot download issues, and control which models your team can access.

Skip if: You don't need to manually manage the cache directory if you're using a cloud service (Hugging Face Spaces, Colab with HF integration) that handles caching for you: but even then, knowing where it is helps when things break.

Explanation

When you call AutoModel.from_pretrained('bert-base-uncased'), Hugging Face doesn't keep the model in memory: it saves it to disk in a default cache directory. This directory is where all downloaded models, tokenizers, and other artifacts live so subsequent calls can load them instantly from disk instead of re-downloading from the internet.

Mechanically, on first run, the library downloads the model files (weights, config, tokenizer) from the Hugging Face Hub, stores them in ~/.cache/huggingface/hub/ (or your custom HF_HOME environment variable), and creates a symlink using the model's unique hash. On subsequent runs, the library checks the cache first: if the model exists there, it loads instantly. If not, it downloads again.

You should care about this because: (1) large models can fill your disk: you may need to set HF_HOME to a larger partition, (2) in production, you might want to pre-cache models so they're ready without first-run downloads, and (3) in collaborative environments, you might want shared caching across team members.

Analogy

Think of it like npm or pip's node_modules or site-packages directories: the first time you install a package, it downloads and sits locally. The next time you need it, it's already there. Except Hugging Face uses content-addressable storage with hashes, so different versions of the same model don't conflict.

Code

python

import os
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Show default cache location
default_cache = Path.home() / '.cache' / 'huggingface' / 'hub'
print(f'Default cache directory: {default_cache}')
print(f'Cache exists: {default_cache.exists()}')

# First download — will fetch from Hub
print('\n--- First load (downloads) ---')
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
print(f'Loaded {model_name}')

# Show cache contents
print(f'\n--- Cache contents ---')
if default_cache.exists():
    cached_models = list(default_cache.glob('models--*'))
    print(f'Cached models: {len(cached_models)}')
    for model_dir in cached_models[:3]:
        print(f'  - {model_dir.name}')

# Second load — instant, no download
print(f'\n--- Second load (from cache, instant) ---')
tokenizer2 = AutoTokenizer.from_pretrained(model_name)
model2 = AutoModelForSequenceClassification.from_pretrained(model_name)
print(f'Loaded from cache (no download)')

# Override cache with HF_HOME environment variable
print(f'\n--- Custom cache directory ---')
custom_cache = Path('/tmp/my_models')
custom_cache.mkdir(exist_ok=True)
os.environ['HF_HOME'] = str(custom_cache)
print(f'Set HF_HOME to: {custom_cache}')

# Load another model to custom cache
print(f'Downloading to custom cache...')
tokenizer3 = AutoTokenizer.from_pretrained('bert-base-cased')
print(f'Cached in: {custom_cache}')
print(f'Contents: {list(custom_cache.glob("hub/models--*"))[:2]}')

Output

Default cache directory: /home/user/.cache/huggingface/hub
Cache exists: True

--- First load (downloads) ---
Loaded distilbert-base-uncased-finetuned-sst-2-english

--- Cache contents ---
Cached models: 2
  - models--distilbert-base-uncased-finetuned-sst-2-english
  - models--bert-base-cased

--- Second load (from cache, instant) ---
Loaded from cache (no download)

--- Custom cache directory ---
Set HF_HOME to: /tmp/my_models
Downloading to custom cache...
Cached in: /tmp/my_models
Contents: [PosixPath('/tmp/my_models/hub/models--bert-base-cased')]

What just happened?

The code showed you the default cache location (usually ~/.cache/huggingface/hub), downloaded two models to it on first run (fetching from the Hub), demonstrated that a second load from the same model comes from cache with no network request, and then showed how to override the cache location with the HF_HOME environment variable so models go to a custom directory instead.

Common gotcha

New developers think 'from_pretrained' downloads on every call. It doesn't: it caches. But they then fill their disk without realizing it, or they think models are in memory (they're not, they're on disk), or they set HF_HOME after already caching models elsewhere and wonder why the new location is empty. Also: if you delete a model file manually from the cache directory, the library will silently re-download it on next import: it doesn't warn you.

Error recovery

OutOfDiskSpace

Your cache is too large. Check with `du -sh ~/.cache/huggingface/hub`. Delete old models manually: `rm -rf ~/.cache/huggingface/hub/models--<unwanted-model>`. Or set HF_HOME to a partition with more space.

PermissionError_accessing_cache

Your HF_HOME points to a directory you don't have write access to. Check permissions with `ls -ld $HF_HOME`. Either change HF_HOME to a writable location or `chmod u+w` the directory.

Model_won't_download_to_custom_cache

You set HF_HOME after the first import. Restart your Python session or kernel after setting the environment variable, because the library caches the path in memory.

Experienced dev note

In production, you almost never want models downloading on first request: that blocks your service startup and fails if the network is down. Pre-populate the cache by running a setup script that calls from_pretrained on all your models during deployment. Also, git-ignore your HF_HOME directory; never commit models to version control. And if you're in a team environment with shared hardware, set HF_HOME to a shared NFS mount so one person's download benefits everyone.

Check your understanding

If you run the same 'from_pretrained' call twice on the same machine without changing anything, why does the second call complete faster, and how would you verify that the model came from cache and not the network?

Show answer hint

A correct answer explains that the library checks the cache directory first using content hashes, finds the model already exists locally, and loads it from disk. Verification could be done by checking file timestamps in the cache directory, monitoring network traffic, or using the TRANSFORMERS_VERBOSITY environment variable to see download logs (second call has none).

VERSION transformers 5.5.x uses the same cache structure as 4.x, so this is backward-compatible. However, in transformers < 4.0.0, cache organization was different (flat directory instead of content-addressed). If migrating from 3.x, you may need to re-download models.

Next, learn how <code>from_pretrained</code> actually loads models into GPU memory and why you need to specify <code>device_map</code> to avoid out-of-memory errors.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.