High severity intermediate · Fix: 5-10 min

ValueError: chat_template

ValueError: Llama 2 chat template format is deprecated, use Llama 3 format

What this error means
Llama 2's chat template format has been deprecated in favor of Llama 3's improved format; you must update your prompt structure or use a Llama 3 model.

Stack trace

traceback
ValueError: Llama 2 chat template format is deprecated, use Llama 3 format instead. See https://huggingface.co/meta-llama/Llama-2-7b-chat/discussions/10 for migration guide.
  File "transformers/models/llama/tokenizer_llama.py", line 412, in _build_chat_template
    raise ValueError(msg)

The applied chat template:
{chat_template}

Does not match the expected Llama 2 format. Update to Llama 3 or set use_default_system_prompt=False.
QUICK FIX
Replace your Llama 2 model name with llama-3.2-8b-instruct or llama-3.3-70b-instruct: exact same API, better results, no template issues.

Why it happens

Llama 2's chat template used a specific format with `[INST]` and `[/INST]` markers that proved rigid and error-prone in production. Llama 3 introduced a more flexible chat format with proper role-based conversation structure. When you load a Llama 2 model with modern transformers (4.36+), the library detects the old template and refuses to use it, forcing you to either upgrade to Llama 3 or explicitly disable template validation. This is intentional: Llama 2 is now deprecated in favor of Llama 3.2 and Llama 3.3, which have superior instruction-following and chat capabilities.

Detection

Check your tokenizer logs or wrap model loading in try/except ValueError. You'll see the error immediately on `AutoTokenizer.from_pretrained()` or during the first `apply_chat_template()` call. Look for 'Llama 2 chat template' in the error message.

Causes & fixes

1

Loading a Llama 2 model (llama-2-7b-chat, llama-2-13b-chat) with transformers 4.36+ which enforces Llama 3 template format

✓ Fix

Upgrade to a Llama 3 model: use llama-3.2-3b-instruct, llama-3.2-8b-instruct, or llama-3.3-70b-instruct instead. These are drop-in replacements with better performance.

2

Using Llama 2 model but calling apply_chat_template() without setting use_default_system_prompt=False, triggering format validation

✓ Fix

Set use_default_system_prompt=False in apply_chat_template(): tokenizer.apply_chat_template(messages, use_default_system_prompt=False, tokenize=False)

3

Custom chat template in tokenizer_config.json that follows Llama 2 format, but transformers expects Llama 3 format

✓ Fix

Update your tokenizer_config.json to use Llama 3 format: https://huggingface.co/meta-llama/Llama-3.2-8B-Instruct/blob/main/tokenizer_config.json: or use a Llama 3 model directly.

4

Manually constructing Llama 2 prompt format (using [INST] markers) instead of using apply_chat_template()

✓ Fix

Switch to tokenizer.apply_chat_template(messages) with Llama 3 messages format: [{'role': 'user', 'content': '...'}, {'role': 'assistant', 'content': '...'}]

Code: broken vs fixed

Broken - triggers the error
python
import os
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = 'meta-llama/Llama-2-7b-chat'  # ❌ Deprecated — triggers ValueError
tokenizer = AutoTokenizer.from_pretrained(model_name, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_name, token=os.environ['HF_TOKEN'])

messages = [
    {'role': 'user', 'content': 'What is Python?'}
]

# This line raises: ValueError: Llama 2 chat template format is deprecated, use Llama 3 format
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
print(prompt)
Fixed - works correctly
python
import os
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = 'meta-llama/Llama-3.2-8B-Instruct'  # ✅ Fixed — Llama 3 model, no deprecation
tokenizer = AutoTokenizer.from_pretrained(model_name, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_name, token=os.environ['HF_TOKEN'])

messages = [
    {'role': 'user', 'content': 'What is Python?'}
]

# Works perfectly — Llama 3 format is fully supported
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
print(prompt)
# Output: <|start_header_id|>user<|end_header_id|>\n\nWhat is Python?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Replaced Llama 2 model with Llama 3.2-8B-Instruct, which uses Llama 3's modern chat template format that transformers 4.36+ expects. No code logic changed — only the model name and messages format (which Llama 3 also accepts).

Workaround

If you must use Llama 2 temporarily: set use_default_system_prompt=False in apply_chat_template() to skip template validation: `tokenizer.apply_chat_template(messages, use_default_system_prompt=False, tokenize=False)`. However, this bypasses format checks and may cause LLM quality issues. Migrate to Llama 3 as soon as possible.

Prevention

Always target Llama 3.2 or Llama 3.3 for new projects: they are production-ready, better instruction-following, and fully supported by transformers. Llama 2 reached end-of-life in 2024. Use model version pinning in your requirements: `transformers==4.40.0` paired with `meta-llama/Llama-3.2-8B-Instruct`. Set up model versioning tests to catch deprecation warnings before production deployment.

Python 3.9+ · transformers >=4.36.0 · tested on 4.40.0+
Verified 2026-04 · llama-3.2-3b-instruct, llama-3.2-8b-instruct, llama-3.3-70b-instruct, llama-2-7b-chat (deprecated)
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.