Concept Beginner · 3 min read

What is temperature in LLMs

Q: What is temperature in LLMs

In LLMs, temperature is a parameter that controls the randomness of the generated text by scaling the probability distribution of possible next tokens. A lower temperature (close to 0) makes output more deterministic and focused, while a higher temperature increases creativity and diversity in responses.

Quick answer

In LLMs, temperature is a parameter that controls the randomness of the generated text by scaling the probability distribution of possible next tokens. A lower temperature (close to 0) makes output more deterministic and focused, while a higher temperature increases creativity and diversity in responses.

Temperature is a parameter in large language models (LLMs) that adjusts the randomness of generated text by controlling the probability distribution of next-token choices.

How it works

Temperature works by modifying the probability distribution from which the model samples the next word or token. Imagine you have a bag of colored balls representing possible next words, each with a different likelihood. A low temperature sharpens the distribution, making the model pick the most likely balls repeatedly, resulting in predictable text. A high temperature flattens the distribution, making less likely balls almost as probable, thus increasing randomness and creativity.

Concrete example

Here is a Python example using the OpenAI SDK to generate text with different temperature settings:

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = "Once upon a time"

# Low temperature (more deterministic)
response_low = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.2
)

# High temperature (more creative)
response_high = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.9
)

print("Low temperature output:\n", response_low.choices[0].message.content)
print("\nHigh temperature output:\n", response_high.choices[0].message.content)

output

Low temperature output:
 Once upon a time, there was a brave knight who protected the kingdom.

High temperature output:
 Once upon a time, a curious dragon danced under the shimmering moonlight, dreaming of faraway lands.

When to use it

Use a low temperature (0.0–0.3) when you want precise, factual, or consistent outputs, such as code generation, summarization, or instructions. Use a higher temperature (0.7–1.0) when you want creative, diverse, or exploratory text, such as storytelling, brainstorming, or poetry. Avoid very high temperatures (>1.0) as they can produce incoherent or nonsensical text.

Key terms

Term	Definition
Temperature	A parameter controlling randomness in token selection during text generation.
Token	A piece of text (word or subword) that the model predicts next.
Probability distribution	The likelihood assigned to each possible next token by the model.
Sampling	The process of selecting the next token based on probabilities.
Deterministic	Output that is predictable and consistent, with low randomness.

✅

Key Takeaways

Temperature controls the randomness of LLM output by scaling token probabilities.
Lower temperature yields more focused and predictable text; higher temperature increases creativity.
Adjust temperature based on your use case: low for accuracy, high for diversity.
Temperature values typically range from 0.0 (deterministic) to 1.0 (creative).
Very high temperatures can cause incoherent or irrelevant outputs.

Verified 2026-04 · gpt-4o

Verify ↗