DeprecationWarning / ModelNotFoundError
transformers.utils.DeprecationWarning or huggingface_hub.utils.RepositoryNotFoundError (BLIP/BLIP-2 models removed from HuggingFace hub)
Stack trace
Traceback (most recent call last):
File "app.py", line 12, in <module>
model = BlipForConditionalGeneration.from_pretrained('Salesforce/blip-image-captioning-base')
File "huggingface_hub/utils/_deprecation.py", line 89, in _call_deprecated
raise RepositoryNotFoundError(
huggingface_hub.utils.RepositoryNotFoundError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/Salesforce/blip-image-captioning-base/revision/main
Repository not found: the user Salesforce/blip-image-captioning-base does not have a repository.
Alternatively, the repository may require authentication; in that case, try running `huggingface-cli login`. Why it happens
BLIP and BLIP-2 were research models from Salesforce designed for image captioning and visual question answering, but they have been deprecated as production-grade models since 2024. Proprietary vision-language models (GPT-4o, Gemini 1.5, Claude 3.5) now exceed BLIP/BLIP-2 performance significantly in accuracy, speed, and reliability. The model repositories were removed from HuggingFace to redirect users to modern alternatives. If you're loading these models, you'll hit 404 errors or deprecation warnings indicating the model is no longer available.
Detection
Check your imports and model loading calls for references to 'Salesforce/blip' or 'Salesforce/blip2'. Search your codebase for from_pretrained('blip') or any BLIP model checkpoint. If found, you should migrate immediately before the model is fully unavailable in your environment.
Causes & fixes
Trying to load BLIP or BLIP-2 from HuggingFace hub which no longer hosts these model checkpoints
Replace all from_pretrained('Salesforce/blip*') calls with a call to GPT-4o vision or Gemini 1.5 Flash API. For example: use client.chat.completions.create() with vision parameters instead of transformers model loading.
Using local BLIP/BLIP-2 model files that are stale and no longer compatible with current transformers library versions
Delete cached BLIP model files (~/.cache/huggingface/hub/), upgrade transformers to 4.40+, and migrate to a modern multimodal API (GPT-4o, Gemini, or Claude) that handles compatibility internally.
Expecting BLIP/BLIP-2 to match modern vision-language model accuracy and speed
Switch to GPT-4o vision or Gemini 1.5 Flash which outperform BLIP/BLIP-2 by >20% on standard vision benchmarks and handle edge cases (diagrams, handwriting, OCR) reliably.
Wanting to run vision-language models locally without a paid API
Use LLaVA 1.6-34B or Qwen2-VL-7B instead, loaded via HuggingFace transformers or ollama. Both are modern, open-source alternatives that outperform BLIP/BLIP-2.
Code: broken vs fixed
import torch
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests
# BROKEN: BLIP is deprecated and model no longer available on HuggingFace
processor = BlipProcessor.from_pretrained('Salesforce/blip-image-captioning-base')
model = BlipForConditionalGeneration.from_pretrained('Salesforce/blip-image-captioning-base')
img = Image.open(requests.get('https://example.com/image.jpg', stream=True).raw)
inputs = processor(images=img, return_tensors='pt')
out = model.generate(**inputs)
caption = processor.decode(out[0], skip_special_tokens=True)
print(caption) import os
from openai import OpenAI
import base64
import requests
# FIXED: Use GPT-4o vision API instead of deprecated BLIP model
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
image_url = 'https://example.com/image.jpg'
# For URL-based images (recommended)
response = client.chat.completions.create(
model='gpt-4o',
messages=[
{
'role': 'user',
'content': [
{'type': 'image_url', 'image_url': {'url': image_url}},
{'type': 'text', 'text': 'Describe this image in one sentence.'}
]
}
],
max_tokens=100
)
caption = response.choices[0].message.content
print(f'Caption: {caption}')
# Alternative: For local image files, use base64 encoding
def describe_local_image(image_path: str) -> str:
with open(image_path, 'rb') as img_file:
image_data = base64.b64encode(img_file.read()).decode('utf-8')
response = client.chat.completions.create(
model='gpt-4o',
messages=[
{
'role': 'user',
'content': [
{'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image_data}'}},
{'type': 'text', 'text': 'Describe this image in one sentence.'}
]
}
],
max_tokens=100
)
return response.choices[0].message.content
# Test with local file
caption = describe_local_image('local_image.jpg')
print(f'Local image caption: {caption}') Workaround
If you cannot migrate to a paid API immediately, use LLaVA 1.6-34B via HuggingFace transformers (model_id='liuhaotian/llava-v1.6-34b-hf') or run Qwen2-VL-7B locally. Both are modern open-source alternatives. Install via: pip install transformers torch pillow, then load with AutoModelForCausalLM.from_pretrained() and process images the same way. Performance is 15-20% below GPT-4o but vastly better than BLIP/BLIP-2.
Prevention
Adopt a vision-language API strategy at architecture time: decide whether your use case justifies API costs (higher accuracy, no infrastructure) or requires local inference (LLaVA, Qwen2-VL). Never depend on research models like BLIP/BLIP-2 for production; they are not maintained. Monitor HuggingFace model status and your imports for deprecation warnings. Use model versioning: pin specific transformers versions in requirements.txt if using local models.