Cheat Sheet intermediate · 8 min read

Perplexity API Cheat Sheet — Real-time Web Search & Citation

version 3.0

Real-time web search + AI reasoning in one API

PERPLEXITY_API_KEY

install pip install perplexity-client

core imports

python

from perplexity import Perplexity
from perplexity.models import SearchOptions

Mental model

Search engine + LLM hybrid: web results fed directly into reasoning model.

If ChatGPT is a library that answers from memory, Perplexity is a research librarian who searches the internet, reads sources, and cites them before answering.

Core Usage Patterns

01 Simple search query

Facts, current events, recent data

python

import os
from perplexity import Perplexity

client = Perplexity(api_key=os.environ["PERPLEXITY_API_KEY"])

response = client.search(
    query="What is the current Bitcoin price?",
    model="sonar-pro"
)

print(response.answer)
for citation in response.citations:
    print(f"Source: {citation.url}")

output Answer with current Bitcoin price + list of source URLs

Always check response.citations: API may return fewer citations than expected for very obscure queries. Citation order doesn't guarantee accuracy.

02 Stream long responses

Real-time user feedback, large research results

python

response = client.search(
    query="Summarize recent AI breakthroughs",
    model="sonar-pro",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

print("\nFinal citations:")
for citation in response.final_citations:
    print(f"- {citation.title}: {citation.url}")

output Streaming text chunks, then citations

stream=True returns an iterator: you must consume it fully before accessing final_citations. Breaking early loses the rest of the response.

03 Multi-turn conversation

Refinement questions, related topics

python

messages = [
    {"role": "user", "content": "What are quantum computers?"},
    {"role": "assistant", "content": "Quantum computers use qubits..."},
    {"role": "user", "content": "How are they different from classical computers?"}
]

response = client.chat(
    messages=messages,
    model="sonar-pro",
    search_options={"enable_search": True}
)

print(response.answer)

output Answer to follow-up with new web search results

Each turn performs a fresh web search: passing entire conversation history increases API cost. Perplexity does not persist context like stateful chatbots.

04 Filter results by recency/domain

Time-sensitive data, academic sources only

python

from datetime import datetime, timedelta

response = client.search(
    query="latest machine learning papers",
    model="sonar-pro",
    search_options={
        "enable_search": True,
        "search_freshness": "last_week",
        "search_domains": ["arxiv.org", "scholar.google.com"]
    }
)

print(response.answer)

output Search results filtered to last week, academic sources only

search_domains is a whitelist: only these domains are searched. If you list domains with no matching results, the response may be low-quality. Omit this parameter for broader coverage.

Search & Chat Parameters

perplexity-api

Parameter	Type	Default	Purpose
`model`	string	sonar-pro	Model ID: sonar-pro, sonar-medium, sonar-mini. Pro has best quality, mini fastest.
`query / messages`	string / array	required	Search query or message array with role + content. For chat, include conversation history.
`stream`	boolean	false	Enable streaming response. Returns iterator of text chunks instead of single response.
`search_freshness`	string	any_time	Filter by recency: last_day, last_week, last_month. Ignored if enable_search=false.
`search_domains`	array	[]	Whitelist of domains to search. Empty list searches all. Reduces hallucination for niche queries.
`enable_search`	boolean	true	Toggle web search. Set false to use only training data (faster, no citations).
`temperature`	float	0.7	Reasoning creativity: 0.0–2.0. Lower = factual, higher = creative. Recommended 0.5–1.0 for search.
`top_p`	float	0.9	Nucleus sampling: 0.1–1.0. Controls diversity. Keep with temperature for consistent behavior.

Core API Methods

Method / Property	Description	Returns
`client.search(query, model, stream, search_options)`	Execute a single search query with optional web filtering. Returns SearchResponse with answer + citations.	SearchResponse \| Iterator[SearchChunk] (if stream=True)
`client.chat(messages, model, search_options)`	Multi-turn conversation with web search. Messages array includes full conversation history. Each call performs fresh search.	ChatResponse with answer + citations
`response.citations`	List of source objects from the search. Each has url, title, snippet. May be empty for very obscure queries.	List[Citation] with url, title, snippet, date attributes
`client.list_models()`	Get available models and current rate limits. Useful for fallback logic when quota is near.	List[Model] with model_id, max_tokens, input_cost, output_cost

Common Errors & Fixes

01 RateLimitError: Quota exceeded

Cause: API calls exceed monthly/daily limit or concurrent request threshold (usually 30 req/min for pro tier).

Fix:

python

import time
from perplexity import Perplexity, RateLimitError

client = Perplexity(api_key=os.environ["PERPLEXITY_API_KEY"])
max_retries = 3

for attempt in range(max_retries):
    try:
        response = client.search(query="...", model="sonar-pro")
        break
    except RateLimitError:
        if attempt < max_retries - 1:
            wait_time = 2 ** attempt  # exponential backoff
            time.sleep(wait_time)
        else:
            raise

02 InvalidAPIKeyError

Cause: PERPLEXITY_API_KEY env var not set, expired, or malformed.

Fix:

python

import os
from perplexity import Perplexity, InvalidAPIKeyError

api_key = os.environ.get("PERPLEXITY_API_KEY")
if not api_key:
    raise ValueError("PERPLEXITY_API_KEY not set. Get key from https://www.perplexity.ai/settings/api")

try:
    client = Perplexity(api_key=api_key)
    response = client.search(query="test", model="sonar-mini")
except InvalidAPIKeyError:
    print("API key invalid or revoked. Regenerate at perplexity.ai/settings/api")

03 ModelNotFoundError

Cause: Requested model doesn't exist or is retired. Perplexity rotates model names (e.g., sonar → sonar-pro).

Fix:

python

from perplexity import Perplexity, ModelNotFoundError

client = Perplexity(api_key=os.environ["PERPLEXITY_API_KEY"])

# Always fall back gracefully
models = client.list_models()
available_ids = [m.model_id for m in models]

preferred_model = "sonar-pro"
model_to_use = preferred_model if preferred_model in available_ids else available_ids[0]

response = client.search(query="...", model=model_to_use)

04 Empty citations in response

Cause: Query is too obscure, very recent, or contains domain filters that exclude all sources.

Fix:

python

response = client.search(
    query="very niche topic",
    model="sonar-pro",
    search_options={"enable_search": True}
)

if not response.citations:
    print("Warning: No sources found. Response may be less reliable.")
    # Retry with broader search parameters
    response = client.search(
        query="very niche topic",
        model="sonar-pro",
        search_options={"enable_search": True, "search_domains": []}  # Remove domain filter
    )

Production Gotchas

⚠ Citations disappear under streaming

When stream=True, citations are buffered and only available after consuming the entire iterator. If you interrupt the stream early, final_citations will be incomplete or empty. Always fully consume streaming responses before processing citations, or use non-streaming mode for fact-critical apps.

⚠ Each chat turn triggers a fresh web search

Passing a multi-turn conversation array to client.chat() does NOT reuse previous search results. Each call performs a new search, multiplying API costs. If you're building a chatbot, consider caching search results or using lower-cost models (sonar-mini) for follow-ups.

⚠ Search domains whitelist is strict

search_domains=['example.com'] will ONLY search example.com. If no results exist on that domain, the response falls back to training data (old, no citations). For robustness, either omit this parameter or include 3+ diverse domains. Never assume a domain will have content on your query.

⚠ temperature vs top_p interact unpredictably

Setting both temperature > 1.0 AND top_p=0.9 can produce inconsistent factuality. Perplexity recommends leaving one at default if tuning the other. For search results, keep temperature ≤ 1.0 and top_p ≥ 0.8.

⚠ API key exposed in error messages

If you don't use os.environ, API keys can leak in stack traces. Always load keys from environment variables. If a key is accidentally logged, regenerate it immediately at https://www.perplexity.ai/settings/api.

⚠ Rate limits are per-minute AND per-month

You might hit 30 req/min (hard limit) before hitting monthly quota. Implement circuit breaker logic and monitor both limits. The API does not tell you remaining monthly quota: track it client-side.

Core Concepts

Grounding

The process of anchoring an LLM's response to real-time web data, reducing hallucinations by forcing the model to cite sources.

Citations

URLs and metadata (title, snippet, date) from search results that support claims in the response. Essential for audit trails in regulated industries.

Search Freshness

A parameter that filters results to data published within a time window (last_day, last_week, last_month), critical for breaking news or time-sensitive queries.

Streaming Response

Real-time token-by-token transmission of the LLM's reasoning, reducing perceived latency but deferring citations until the stream is fully consumed.

Multi-turn Conversation

A sequence of user→assistant→user messages passed as a single array. Each turn triggers a new web search; context is inferred by the model, not persisted server-side.

Sonar Models

Perplexity's proprietary reasoning models: sonar-pro (best quality, slower), sonar-medium (balanced), sonar-mini (fastest, for simple queries). All support web search.

Perplexity vs Alternatives

Feature	Perplexity API	OpenAI GPT-4o	Claude 3.5 Sonnet	Google Gemini Pro
Real-time web search	✓ (native)	✗ (requires plugin)	✗ (requires integration)	✓ (optional)
Citations included	✓ (all responses)	✗	✗	✓ (with search)
Streaming	✓	✓	✓	✓
Multi-turn context	✓ (stateless per call)	✓ (conversation history)	✓ (conversation history)	✓ (conversation history)
Cost per 1K tokens	$0.003 (sonar-mini)	$0.025 (gpt-4o)	$0.003 (input)	$0.0075 (flash)
Search domain filtering	✓	✗	✗	✗
Hallucination rate	Very low (grounded)	Low	Very low	Low
Best for	Fact-checking, research, current events	General reasoning, code	Analysis, long context	Multimodal + search

Verified 2026-04 · v3.0 · gpt-4o

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.