Concept Beginner · 3 min read

What is context window in LLM

Q: What is context window in LLM

A context window in a large language model (LLM) is the maximum number of tokens the model can process at once to understand and generate text. It limits how much prior conversation or document content the model can consider when producing a response.

Quick answer

A context window in a large language model (LLM) is the maximum number of tokens the model can process at once to understand and generate text. It limits how much prior conversation or document content the model can consider when producing a response.

Context window is the maximum token length that a large language model (LLM) can process in a single input to maintain context and coherence.

How it works

The context window acts like the model's short-term memory, defining how many tokens (words or pieces of words) it can "see" at once. Imagine reading a book through a small window: you can only read a limited number of words at a time. If the window is too small, you might lose track of the story's earlier parts. Similarly, an LLM with a small context window can only consider recent text, which limits its ability to reference long documents or conversations.

Technically, the context window size is the maximum sequence length of tokens the model's transformer architecture can attend to during inference. Tokens beyond this limit are truncated or ignored, so the model cannot use that information to generate responses.

Concrete example

For example, gpt-4o-mini has a context window of 8,192 tokens. If you provide a prompt longer than this, the model will only consider the last 8,192 tokens.

Here is a Python example using the OpenAI SDK to check how many tokens you can send in a prompt (approximate):

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example prompt with repeated text to approach context window
prompt = "Hello world. " * 1000  # ~2000 tokens approx

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

print("Response:", response.choices[0].message.content)

output

Response: Hello world. Hello world. Hello world. ... (truncated output)

When to use it

Use models with larger context windows when you need to process or generate text based on long documents, multi-turn conversations, or complex instructions that require remembering earlier context. For example, summarizing a long report or maintaining context in a lengthy chat.

Do not rely on large context windows if your use case involves only short prompts or isolated queries, as smaller models with shorter context windows can be more efficient and cost-effective.

Key terms

Term	Definition
Context window	Maximum number of tokens an LLM can process at once.
Token	A piece of text (word or subword) used as input to the model.
Transformer	Neural network architecture underlying most LLMs.
Truncation	Cutting off tokens beyond the context window limit.
Inference	The process of generating output from a model given input.

Key Takeaways

The context window limits how much text an LLM can consider at once, affecting coherence.
Larger context windows enable handling longer documents and conversations effectively.
Exceeding the context window causes truncation, losing earlier context information.

Verified 2026-04 · gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.