Concept Beginner · 3 min read

What is context window in LLMs

Q: What is context window in LLMs

A context window in large language models (LLMs) is the maximum number of tokens the model can consider at once when generating or understanding text. It limits how much prior conversation or text the model can 'remember' and use to produce relevant responses.

Quick answer

A context window in large language models (LLMs) is the maximum number of tokens the model can consider at once when generating or understanding text. It limits how much prior conversation or text the model can 'remember' and use to produce relevant responses.

Context window is the maximum span of tokens an LLM can process at once that determines how much text it can consider for generating coherent outputs.

How it works

The context window acts like the model's short-term memory, defining how many tokens (words or pieces of words) it can 'see' simultaneously. Imagine reading a book but only being able to keep the last few pages in mind while writing a summary. If the window is too small, the model forgets earlier parts of the conversation or document, which can cause it to lose track of context or details.

Technically, the model processes input tokens in a fixed-length sequence. Tokens beyond this limit are truncated or ignored. This limit is set during model training and architecture design.

Concrete example

Suppose a model has a context window of 8,192 tokens. If you input a conversation or document longer than 8,192 tokens, the model only considers the most recent 8,192 tokens for generating the next output.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me a story that is very long..."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    max_tokens=100
)

print(response.choices[0].message.content)

# Note: If the conversation history exceeds the model's context window (e.g., 8,192 tokens),
# earlier messages will be truncated automatically to fit within the limit.

output

Once upon a time, in a faraway land, there was a kingdom where...

When to use it

Use models with larger context windows when you need to process or generate long documents, maintain extended conversations, or handle complex tasks requiring memory of earlier context. For example, summarizing long reports or multi-turn dialogues benefits from a large context window.

Do not rely on models with small context windows for tasks needing long-term memory or extensive context, as they will lose earlier information.

Key terms

Term	Definition
Context window	Maximum number of tokens an LLM can process at once.
Token	A piece of text, such as a word or subword unit, used as input to the model.
Large Language Model (LLM)	A neural network trained on vast text data to generate or understand language.
Truncation	Cutting off tokens beyond the context window limit.

✅

Key Takeaways

The context window limits how much text an LLM can consider simultaneously.
Larger context windows enable handling longer documents and conversations effectively.
Exceeding the context window causes earlier tokens to be truncated and forgotten.
Choose models with context windows that fit your application's text length needs.

Verified 2026-04 · gpt-4o

Verify ↗