High severity intermediate · Fix: 2-5 min

ContextLengthExceededError

ollama.errors.ContextLengthExceededError

What this error means
Ollama's context length exceeded num_ctx error occurs when the input prompt plus conversation history exceed the model's maximum token limit.

Stack trace

traceback
ollama.errors.ContextLengthExceededError: The input context length exceeded the model's num_ctx limit of 2048 tokens.
QUICK FIX
Reduce prompt and conversation length to fit within the model's num_ctx token limit or switch to a model with a larger context window.

Why it happens

Ollama models have a fixed maximum context length (num_ctx) that limits how many tokens can be processed in a single request. When the combined tokens of the prompt, conversation history, and any system messages exceed this limit, the client throws this error to prevent invalid requests.

Detection

Monitor token usage before sending requests by tokenizing prompts and conversation history; log or assert token counts do not exceed the model's num_ctx limit.

Causes & fixes

1

Prompt plus conversation history tokens exceed the model's maximum context length (num_ctx).

✓ Fix

Truncate or summarize conversation history and reduce prompt length to fit within the model's token limit.

2

Using a model with a smaller context window than expected for your application.

✓ Fix

Switch to an Ollama model variant with a larger num_ctx token limit if available.

3

Unintentionally appending large system or user messages repeatedly in the conversation state.

✓ Fix

Implement logic to prune or reset conversation history periodically to keep token count under the limit.

Code: broken vs fixed

Broken - triggers the error
python
import os
import ollama

client = ollama
prompt = "A" * 3000  # Very long prompt exceeding context length

# This line triggers the context length exceeded error
response = client.chat(model="ollama-model", messages=[{"role": "user", "content": prompt}])
print(response)
Fixed - works correctly
python
import os
import ollama

client = ollama
prompt = "A" * 1500  # Reduced prompt length to fit context window

# Fixed: prompt length reduced to avoid context length exceeded error
response = client.chat(model="ollama-model", messages=[{"role": "user", "content": prompt}])
print(response)
Reduced the prompt length to ensure the total tokens stay within the model's num_ctx limit, preventing the context length exceeded error.

Workaround

Catch the ContextLengthExceededError exception, then truncate or summarize the prompt and conversation history before retrying the request.

Prevention

Implement token counting and prompt length checks before sending requests, and use models with larger context windows when handling long conversations or documents.

Python 3.9+ · ollama >=0.1.0 · tested on 0.1.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.