ContextLengthExceededError
together_ai.errors.ContextLengthExceededError
Stack trace
together_ai.errors.ContextLengthExceededError: Input context length exceeded the maximum allowed tokens for the model (e.g., 4096 tokens).
Why it happens
Together AI models have a fixed maximum token limit for input context. When the combined tokens of the prompt, conversation history, and any system instructions exceed this limit, the API raises ContextLengthExceededError to prevent processing invalid input.
Detection
Monitor token usage before sending requests by tokenizing inputs and summing tokens; log or assert if token count approaches or exceeds the model's max context length.
Causes & fixes
Prompt plus conversation history tokens exceed the model's max context length.
Truncate or summarize conversation history and reduce prompt length to fit within the model's token limit.
Including large documents or embeddings inline in the prompt without chunking.
Split large documents into smaller chunks and process them separately or use retrieval-augmented generation to limit tokens per request.
Repeatedly appending full conversation history without pruning.
Implement a sliding window or summary approach to keep only recent or relevant conversation context.
Code: broken vs fixed
from together_ai import TogetherAI
client = TogetherAI(api_key=os.environ['TOGETHER_API_KEY'])
prompt = """Very long conversation or document exceeding token limit..."""
response = client.chat.completions.create(model="together-large", messages=[{"role": "user", "content": prompt}]) # triggers ContextLengthExceededError import os
from together_ai import TogetherAI
client = TogetherAI(api_key=os.environ['TOGETHER_API_KEY'])
# Truncate or summarize prompt to fit token limit
prompt = """Very long conversation or document exceeding token limit..."""
max_tokens = 4096 # example max tokens for model
# Simple truncation example (replace with proper tokenizer in prod)
if len(prompt.split()) > max_tokens:
prompt = ' '.join(prompt.split()[:max_tokens])
response = client.chat.completions.create(model="together-large", messages=[{"role": "user", "content": prompt}]) # fixed
print(response) Workaround
Catch ContextLengthExceededError and programmatically truncate or summarize the input prompt or conversation history before retrying the request.
Prevention
Implement token counting and prompt management in your application to keep inputs within the model's max context length, using summarization or chunking strategies to avoid exceeding limits.