Code Beginner easy · 5 min

Ollama model pull failures

What you will learn
Diagnose why <code>ollama pull</code> fails and recover from incomplete downloads, authentication errors, and disk space issues.

Why this matters

Model pulls fail silently or with cryptic errors in development environments. Understanding the root causes: disk space, network timeouts, corrupted cache: prevents hours of debugging and teaches you how Ollama's layer-based download system works under the hood.

Skip if: You don't need this if you're only using pre-pulled models in a fully provisioned production environment, or if your CI/CD pipeline manages all model pulls before deployment. You also don't need this if you're using cloud-hosted LLaMA APIs (like Together AI or Replicate) instead of running locally.

Explanation

What it is: Ollama model pulls can fail due to network interruptions, insufficient disk space, authentication blockers, or corrupted layer cache. Unlike a simple file download, Ollama uses a layered model format similar to Docker: partial downloads leave broken state that must be cleaned before retrying.

How it works mechanically: When you run ollama pull llama3.2, Ollama downloads model layers (weights, tokenizer, config) to ~/.ollama/models on Linux/Mac or %USERPROFILE%\.ollama\models on Windows. If the process halts: network timeout, OOM killer, disk full: Ollama leaves incomplete layer manifests. Retrying without cleanup attempts to re-download the same corrupted manifest, causing repeated failures. The fix is to remove the incomplete model directory and restart the pull from scratch.

When to use this pattern: Always validate disk space and network connectivity before large model pulls in CI/CD or headless environments. On developer machines, use ollama list to check what's already cached to avoid redundant pulls.

Analogy

Imagine downloading a 20GB file via torrent. If the connection drops at 15GB, your torrent client leaves a partial .tmp file. Retrying the same torrent without deleting the broken file wastes bandwidth re-downloading the same bad chunks. Ollama works the same way: you must delete the incomplete model folder before retrying.

Code

python
import subprocess
import os
import shutil
from pathlib import Path

def safe_ollama_pull(model_name: str, max_retries: int = 3) -> bool:
    """
    Safely pull an Ollama model with retry logic and cleanup on failure.
    Returns True if pull succeeds, False if all retries exhausted.
    """
    models_dir = Path.home() / '.ollama' / 'models'
    model_dir = models_dir / 'models' / 'library' / model_name
    
    for attempt in range(1, max_retries + 1):
        print(f'Attempt {attempt}/{max_retries}: pulling {model_name}')
        
        disk_free_gb = shutil.disk_usage(models_dir).free / (1024**3)
        if disk_free_gb < 5:
            print(f'ERROR: Only {disk_free_gb:.1f}GB free. Need at least 5GB.')
            return False
        
        try:
            result = subprocess.run(
                ['ollama', 'pull', model_name],
                capture_output=True,
                text=True,
                timeout=600
            )
            
            if result.returncode == 0:
                print(f'SUCCESS: {model_name} pulled')
                return True
            else:
                print(f'FAILED: {result.stderr}')
                if 'connection' in result.stderr.lower() or 'timeout' in result.stderr.lower():
                    if model_dir.exists():
                        print(f'Cleaning incomplete model at {model_dir}')
                        shutil.rmtree(model_dir, ignore_errors=True)
                    if attempt < max_retries:
                        print(f'Retrying in 5 seconds...')
                        import time
                        time.sleep(5)
                        continue
        
        except subprocess.TimeoutExpired:
            print(f'TIMEOUT: Pull exceeded 10 minutes')
            if model_dir.exists():
                print(f'Cleaning incomplete model')
                shutil.rmtree(model_dir, ignore_errors=True)
            if attempt < max_retries:
                import time
                time.sleep(5)
                continue
        
        except FileNotFoundError:
            print('ERROR: ollama command not found. Install Ollama from ollama.com')
            return False
    
    print(f'FAILED: Could not pull {model_name} after {max_retries} attempts')
    return False

if __name__ == '__main__':
    success = safe_ollama_pull('llama3.2:3b')
    if success:
        result = subprocess.run(['ollama', 'list'], capture_output=True, text=True)
        print('\nAvailable models:')
        print(result.stdout)
    else:
        print('Model pull failed. Check disk space, network, and Ollama service.')
Output
Attempt 1/3: pulling llama3.2:3b
SUCCESS: llama3.2:3b pulled

Available models:
NAME                    ID              SIZE      MODIFIED
llama3.2:3b             3b1b6acd6e9d    2.0 GB    5 minutes ago

What just happened?

The code checked available disk space, attempted to pull the model via the Ollama CLI, and if a network-related error occurred, it cleaned up the incomplete model directory and retried. On success, it listed available models to confirm the pull completed. The subprocess call invokes the system <code>ollama</code> command and captures its output.

Common gotcha

The single biggest mistake is retrying ollama pull in a loop without deleting the broken model directory. Each retry re-downloads the same corrupted manifest, failing in the exact same way. You must remove the model folder from ~/.ollama/models/models/library/{model_name} (the nested structure matters) before retrying. Also: ollama list may still show the model as 'partial' even after the pull fails: this cached metadata doesn't auto-clean, so rely on filesystem checks, not CLI output.

Error recovery

Error pulling manifest: error getting token: net/http: request canceled
Network timeout or proxy issue. Check internet connectivity. The <code>timeout=600</code> in the code above prevents hangs. If behind a corporate proxy, set <code>HTTP_PROXY</code> environment variable before running Ollama.
no space left on device
Disk is full. LLaMA 3.2 models range 2GB–70GB. Run <code>df -h ~/.ollama</code> to check. Delete old models with <code>ollama rm old_model_name</code> or free disk space. Never interrupt a pull due to OOM.
connection refused
Ollama service is not running. On Linux/Mac: <code>ollama serve</code> in one terminal, then pull in another. On Windows/Mac: open the Ollama app. The code checks for <code>FileNotFoundError</code> which catches this.
WARNING: unable to verify checksum of the manifest
Corrupted cache or incomplete layer. Delete <code>~/.ollama/models/manifests</code> and retry. This is rare but indicates a previous pull was interrupted mid-layer.
model not found
The model name is incorrect or not available on Ollama's library. Use <code>ollama list</code> to see available models. Typos like 'llama3' (missing version) are common: use <code>llama3.2</code> instead.

Experienced dev note

In production CI/CD, never assume ollama pull is idempotent: it's not. A failed pull leaves state that blocks future runs. Pre-pull models in Docker build layers, not at runtime. For headless servers without access to Ollama Hub, export a model from a development machine with ollama export llama3.2:3b /tmp/llama.tar, then import on the server with a custom tar extraction: this bypasses network entirely. Also: Ollama's background service sometimes dies silently on low-memory systems; monitor with ollama ps before assuming pulls work.

Check your understanding

Why does retrying ollama pull in a loop without cleanup fail repeatedly with the same error, and what happens to the model directory after each failed attempt?

Show answer hint

A correct answer must explain that Ollama caches incomplete layer manifests in the model directory, and retrying without deletion attempts to re-download the same broken manifest structure. It should mention that the model directory persists even after failure, creating a state conflict.

VERSION Ollama 0.5.0+ (current stable) uses layer-based caching identical to Docker. Older versions < 0.3.0 used flat file downloads; cleanup steps differ. Always run ollama --version to confirm you're on 0.5.x.
NEXT

Once pulls succeed reliably, learn how to load a pulled model in Python using <code>ollama.chat()</code> to actually run inference: this connects to the running Ollama service and streams responses.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.