Grok Cheat Sheet — xAI's Real-Time Reasoning Model
Reasoning model with web access; trade speed for intelligence.
It's like asking a researcher who can browse Wikipedia live, vs. asking an encyclopedia reader. The researcher takes longer but gives you today's facts and can explain the logic.
Key Concepts
Xai Grok Comparison
| Model | Speed | Reasoning | Web Access | Cost/1M tokens | Best For |
|---|---|---|---|---|---|
| Grok-3 | 3–8s avg | Deep (o1-like) | ✓ Real-time | $5–8 | Complex reasoning + current data |
| o1-pro | 2–5s avg | Deep | ✗ Static cutoff | $20 | Pure reasoning without web |
| gpt-4o | 0.5–1s avg | Moderate | ✗ Static cutoff | $2.50 | Speed-critical + high volume |
| Claude 3.5 Sonnet | 1–2s avg | High | ✗ Static cutoff | $3 | Coding + general intelligence |
Xai Grok Patterns
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["XAI_API_KEY"],
base_url="https://api.x.ai/v1"
)
response = client.chat.completions.create(
model="grok-3",
messages=[
{
"role": "user",
"content": "What's the current Bitcoin price and why did it move today?"
}
],
temperature=1, # Grok requires T=1 for reasoning
max_tokens=8192
)
print(response.choices[0].message.content) Grok fetches live BTC data, explains market drivers, cites recent news. response = client.chat.completions.create(
model="grok-3",
messages=[
{
"role": "user",
"content": "Summarize the latest developments in quantum computing (last 48 hours). Show sources."
}
],
temperature=1,
max_tokens=6000
)
for chunk in response.choices[0].message.content.split("Source:"):
print(chunk.strip()) Grok returns latest news with timestamps and source links. response = client.chat.completions.create(
model="grok-3",
messages=[
{
"role": "user",
"content": "Solve this system: 3x + 2y = 7, 5x - y = 4. Show work.\n\nThen write a Python function to solve any 2x2 system generically."
}
],
temperature=1,
max_tokens=4000
)
print(response.choices[0].message.content) Grok shows algebraic steps, then provides tested code. messages = [
{"role": "user", "content": "Explain PageRank algorithm."}
]
response = client.chat.completions.create(
model="grok-3",
messages=messages,
temperature=1,
max_tokens=3000
)
messages.append({
"role": "assistant",
"content": response.choices[0].message.content
})
messages.append({
"role": "user",
"content": "Now explain how Google uses it today in 2026. Still relevant?"
})
response = client.chat.completions.create(
model="grok-3",
messages=messages,
temperature=1,
max_tokens=2000
)
print(response.choices[0].message.content) Grok references current state of SEO and web ranking. Key Request Parameters
Grok API
| Parameter | Default | Type | Notes |
|---|---|---|---|
model | grok-3 | string | Only grok-3 supported; grok-2 deprecated Feb 2026 |
temperature | 1.0 | float | MUST be 1.0 for reasoning. No deviation allowed. |
max_tokens | 8192 | int | Grok reasoning outputs avg 2000–4000; max 16384 |
top_p | 0.95 | float | Typical range 0.9–1.0; ignored if T≠1 |
web_search | true | bool | Enable real-time web access; disable to use cached knowledge only |
timeout | 60s | int | Grok reasoning can take 8–10s; set timeout ≥15s |
Common Errors & Fixes
temperature must be 1.0 for reasoning mode Cause: Grok enforces T=1 to ensure chain-of-thought consistency. Lowering T collapses reasoning to standard inference.
Always set temperature=1 explicitly:
response = client.chat.completions.create(
model="grok-3",
messages=messages,
temperature=1.0, # Required
max_tokens=8000
) API timeout after 60 seconds Cause: Grok reasoning + web fetch can exceed default timeout. Requests library defaults to 60s.
Increase timeout and add retry logic:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=2, min=2))
def query_grok(prompt):
return client.chat.completions.create(
model="grok-3",
messages=[{"role": "user", "content": prompt}],
temperature=1,
max_tokens=8000,
timeout=90
)
response = query_grok("Complex reasoning task...") Rate limit: 60 req/min (free tier), 500 req/min (pro) Cause: Web fetching is expensive; Grok has strict rate limits to manage infrastructure cost.
Implement exponential backoff and request queuing:
import time
from collections import deque
class GrokQueue:
def __init__(self, max_per_min=60):
self.queue = deque()
self.max_per_min = max_per_min
self.window_start = time.time()
def submit(self, prompt):
now = time.time()
if now - self.window_start > 60:
self.queue.clear()
self.window_start = now
if len(self.queue) >= self.max_per_min:
wait_time = 60 - (now - self.window_start)
print(f"Rate limited. Waiting {wait_time:.1f}s...")
time.sleep(wait_time)
self.window_start = time.time()
return client.chat.completions.create(
model="grok-3",
messages=[{"role": "user", "content": prompt}],
temperature=1
) web_search=true returns stale data (>24h old) Cause: Web index lag; Grok caches crawled pages for performance. Real-time means 'recent' not 'live-streamed'.
For breaking news, specify recency requirement in prompt:
response = client.chat.completions.create(
model="grok-3",
messages=[{
"role": "user",
"content": "Breaking: Did [EVENT] happen in the last 2 hours? Search NOW. Confirm timestamp."
}],
temperature=1,
web_search=True
) Production Gotchas
Grok adds 2–3s latency even for trivial questions ("What year was 1984 published?"). For simple lookups, use gpt-4o (0.5s). Reserve Grok for genuinely complex reasoning.
Any T<1 disables reasoning chain entirely. The model will still work but behaves like gpt-4o without the web access. This is a silent failure: no error thrown.
Grok cites sources for web-fetched facts but doesn't always include them in structured format. Parse markdown links from response text; don't rely on a 'sources' field.
By default, only the final answer is returned. Use 'show_reasoning' parameter to expose chain-of-thought (adds 500–1000 tokens to output).
Grok flags adversarial prompts and reports them. Using Grok for red-teaming requires explicit opt-in. Heavy jailbreak attempts trigger account review.
Complex queries output 4000+ tokens at $5–8/M tokens. A single 'prove P=NP' query can cost $0.02. Budget for higher per-request costs than gpt-4o.
Complete production example: Real-time market analysis with caching and error handling
import os
import json
import time
from datetime import datetime, timedelta
from openai import OpenAI
from tenacity import retry, stop_after_attempt, wait_exponential
# Initialize
client = OpenAI(
api_key=os.environ["XAI_API_KEY"],
base_url="https://api.x.ai/v1"
)
# Simple in-memory cache
query_cache = {}
CACHE_TTL = 3600 # 1 hour
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=2, min=4))
def query_grok_with_cache(prompt: str, use_web: bool = True) -> str:
"""Query Grok with caching and retry logic."""
cache_key = hash((prompt, use_web))
# Check cache
if cache_key in query_cache:
cached_time, cached_response = query_cache[cache_key]
if datetime.now() - cached_time < timedelta(seconds=CACHE_TTL):
print(f"[CACHE HIT] {prompt[:50]}...")
return cached_response
print(f"[QUERY] {prompt[:50]}... (timeout: 90s)")
try:
response = client.chat.completions.create(
model="grok-3",
messages=[{"role": "user", "content": prompt}],
temperature=1.0, # REQUIRED for reasoning
max_tokens=6000,
timeout=90
)
content = response.choices[0].message.content
# Cache result
query_cache[cache_key] = (datetime.now(), content)
return content
except Exception as e:
if "rate_limit" in str(e).lower():
print("[RATE_LIMITED] Backing off...")
raise
raise
# Example: Multi-turn market analysis
if __name__ == "__main__":
print("=== Grok Real-Time Market Analysis ===")
# Turn 1: Get current data
analysis = query_grok_with_cache(
"What's the current state of AI chip stocks (NVIDIA, Intel, AMD)? "
"Include today's price movement and market sentiment."
)
print(f"\n[ANALYSIS]\n{analysis}\n")
# Turn 2: Follow-up with latest trends
trends = query_grok_with_cache(
"Based on recent AI chip developments, which sector is most likely to grow 50% in 2026?"
)
print(f"[TRENDS]\n{trends}\n")
# Turn 3: Cached (returns instantly)
cached_again = query_grok_with_cache(
"What's the current state of AI chip stocks (NVIDIA, Intel, AMD)? "
"Include today's price movement and market sentiment."
)
print(f"[CACHED RESULT]\n{cached_again}")