What is context window in LLM
context window in a large language model (LLM) is the maximum number of tokens the model can process at once to understand and generate text. It limits how much prior conversation or document content the model can consider when producing a response.Context window is the maximum token length that a large language model (LLM) can process in a single input to maintain context and coherence.How it works
The context window acts like the model's short-term memory, defining how many tokens (words or pieces of words) it can "see" at once. Imagine reading a book through a small window: you can only read a limited number of words at a time. If the window is too small, you might lose track of the story's earlier parts. Similarly, an LLM with a small context window can only consider recent text, which limits its ability to reference long documents or conversations.
Technically, the context window size is the maximum sequence length of tokens the model's transformer architecture can attend to during inference. Tokens beyond this limit are truncated or ignored, so the model cannot use that information to generate responses.
Concrete example
For example, gpt-4o-mini has a context window of 8,192 tokens. If you provide a prompt longer than this, the model will only consider the last 8,192 tokens.
Here is a Python example using the OpenAI SDK to check how many tokens you can send in a prompt (approximate):
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example prompt with repeated text to approach context window
prompt = "Hello world. " * 1000 # ~2000 tokens approx
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
print("Response:", response.choices[0].message.content) Response: Hello world. Hello world. Hello world. ... (truncated output)
When to use it
Use models with larger context windows when you need to process or generate text based on long documents, multi-turn conversations, or complex instructions that require remembering earlier context. For example, summarizing a long report or maintaining context in a lengthy chat.
Do not rely on large context windows if your use case involves only short prompts or isolated queries, as smaller models with shorter context windows can be more efficient and cost-effective.
Key terms
| Term | Definition |
|---|---|
| Context window | Maximum number of tokens an LLM can process at once. |
| Token | A piece of text (word or subword) used as input to the model. |
| Transformer | Neural network architecture underlying most LLMs. |
| Truncation | Cutting off tokens beyond the context window limit. |
| Inference | The process of generating output from a model given input. |
Key Takeaways
- The context window limits how much text an LLM can consider at once, affecting coherence.
- Larger context windows enable handling longer documents and conversations effectively.
- Exceeding the context window causes truncation, losing earlier context information.