Perplexity API Cheat Sheet — Real-time Web Search & Citation
from perplexity import Perplexity
from perplexity.models import SearchOptions Search engine + LLM hybrid: web results fed directly into reasoning model.
If ChatGPT is a library that answers from memory, Perplexity is a research librarian who searches the internet, reads sources, and cites them before answering.
Core Usage Patterns
import os
from perplexity import Perplexity
client = Perplexity(api_key=os.environ["PERPLEXITY_API_KEY"])
response = client.search(
query="What is the current Bitcoin price?",
model="sonar-pro"
)
print(response.answer)
for citation in response.citations:
print(f"Source: {citation.url}") Answer with current Bitcoin price + list of source URLs response = client.search(
query="Summarize recent AI breakthroughs",
model="sonar-pro",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
print("\nFinal citations:")
for citation in response.final_citations:
print(f"- {citation.title}: {citation.url}") Streaming text chunks, then citations messages = [
{"role": "user", "content": "What are quantum computers?"},
{"role": "assistant", "content": "Quantum computers use qubits..."},
{"role": "user", "content": "How are they different from classical computers?"}
]
response = client.chat(
messages=messages,
model="sonar-pro",
search_options={"enable_search": True}
)
print(response.answer) Answer to follow-up with new web search results from datetime import datetime, timedelta
response = client.search(
query="latest machine learning papers",
model="sonar-pro",
search_options={
"enable_search": True,
"search_freshness": "last_week",
"search_domains": ["arxiv.org", "scholar.google.com"]
}
)
print(response.answer) Search results filtered to last week, academic sources only Search & Chat Parameters
perplexity-api
| Parameter | Type | Default | Purpose |
|---|---|---|---|
model | string | sonar-pro | Model ID: sonar-pro, sonar-medium, sonar-mini. Pro has best quality, mini fastest. |
query / messages | string / array | required | Search query or message array with role + content. For chat, include conversation history. |
stream | boolean | false | Enable streaming response. Returns iterator of text chunks instead of single response. |
search_freshness | string | any_time | Filter by recency: last_day, last_week, last_month. Ignored if enable_search=false. |
search_domains | array | [] | Whitelist of domains to search. Empty list searches all. Reduces hallucination for niche queries. |
enable_search | boolean | true | Toggle web search. Set false to use only training data (faster, no citations). |
temperature | float | 0.7 | Reasoning creativity: 0.0–2.0. Lower = factual, higher = creative. Recommended 0.5–1.0 for search. |
top_p | float | 0.9 | Nucleus sampling: 0.1–1.0. Controls diversity. Keep with temperature for consistent behavior. |
Core API Methods
| Method / Property | Description | Returns |
|---|---|---|
client.search(query, model, stream, search_options) | Execute a single search query with optional web filtering. Returns SearchResponse with answer + citations. | SearchResponse | Iterator[SearchChunk] (if stream=True) |
client.chat(messages, model, search_options) | Multi-turn conversation with web search. Messages array includes full conversation history. Each call performs fresh search. | ChatResponse with answer + citations |
response.citations | List of source objects from the search. Each has url, title, snippet. May be empty for very obscure queries. | List[Citation] with url, title, snippet, date attributes |
client.list_models() | Get available models and current rate limits. Useful for fallback logic when quota is near. | List[Model] with model_id, max_tokens, input_cost, output_cost |
Common Errors & Fixes
RateLimitError: Quota exceeded Cause: API calls exceed monthly/daily limit or concurrent request threshold (usually 30 req/min for pro tier).
import time
from perplexity import Perplexity, RateLimitError
client = Perplexity(api_key=os.environ["PERPLEXITY_API_KEY"])
max_retries = 3
for attempt in range(max_retries):
try:
response = client.search(query="...", model="sonar-pro")
break
except RateLimitError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # exponential backoff
time.sleep(wait_time)
else:
raise InvalidAPIKeyError Cause: PERPLEXITY_API_KEY env var not set, expired, or malformed.
import os
from perplexity import Perplexity, InvalidAPIKeyError
api_key = os.environ.get("PERPLEXITY_API_KEY")
if not api_key:
raise ValueError("PERPLEXITY_API_KEY not set. Get key from https://www.perplexity.ai/settings/api")
try:
client = Perplexity(api_key=api_key)
response = client.search(query="test", model="sonar-mini")
except InvalidAPIKeyError:
print("API key invalid or revoked. Regenerate at perplexity.ai/settings/api") ModelNotFoundError Cause: Requested model doesn't exist or is retired. Perplexity rotates model names (e.g., sonar → sonar-pro).
from perplexity import Perplexity, ModelNotFoundError
client = Perplexity(api_key=os.environ["PERPLEXITY_API_KEY"])
# Always fall back gracefully
models = client.list_models()
available_ids = [m.model_id for m in models]
preferred_model = "sonar-pro"
model_to_use = preferred_model if preferred_model in available_ids else available_ids[0]
response = client.search(query="...", model=model_to_use) Empty citations in response Cause: Query is too obscure, very recent, or contains domain filters that exclude all sources.
response = client.search(
query="very niche topic",
model="sonar-pro",
search_options={"enable_search": True}
)
if not response.citations:
print("Warning: No sources found. Response may be less reliable.")
# Retry with broader search parameters
response = client.search(
query="very niche topic",
model="sonar-pro",
search_options={"enable_search": True, "search_domains": []} # Remove domain filter
) Production Gotchas
When stream=True, citations are buffered and only available after consuming the entire iterator. If you interrupt the stream early, final_citations will be incomplete or empty. Always fully consume streaming responses before processing citations, or use non-streaming mode for fact-critical apps.
Passing a multi-turn conversation array to client.chat() does NOT reuse previous search results. Each call performs a new search, multiplying API costs. If you're building a chatbot, consider caching search results or using lower-cost models (sonar-mini) for follow-ups.
search_domains=['example.com'] will ONLY search example.com. If no results exist on that domain, the response falls back to training data (old, no citations). For robustness, either omit this parameter or include 3+ diverse domains. Never assume a domain will have content on your query.
Setting both temperature > 1.0 AND top_p=0.9 can produce inconsistent factuality. Perplexity recommends leaving one at default if tuning the other. For search results, keep temperature ≤ 1.0 and top_p ≥ 0.8.
If you don't use os.environ, API keys can leak in stack traces. Always load keys from environment variables. If a key is accidentally logged, regenerate it immediately at https://www.perplexity.ai/settings/api.
You might hit 30 req/min (hard limit) before hitting monthly quota. Implement circuit breaker logic and monitor both limits. The API does not tell you remaining monthly quota: track it client-side.
Core Concepts
Perplexity vs Alternatives
| Feature | Perplexity API | OpenAI GPT-4o | Claude 3.5 Sonnet | Google Gemini Pro |
|---|---|---|---|---|
| Real-time web search | ✓ (native) | ✗ (requires plugin) | ✗ (requires integration) | ✓ (optional) |
| Citations included | ✓ (all responses) | ✗ | ✗ | ✓ (with search) |
| Streaming | ✓ | ✓ | ✓ | ✓ |
| Multi-turn context | ✓ (stateless per call) | ✓ (conversation history) | ✓ (conversation history) | ✓ (conversation history) |
| Cost per 1K tokens | $0.003 (sonar-mini) | $0.025 (gpt-4o) | $0.003 (input) | $0.0075 (flash) |
| Search domain filtering | ✓ | ✗ | ✗ | ✗ |
| Hallucination rate | Very low (grounded) | Low | Very low | Low |
| Best for | Fact-checking, research, current events | General reasoning, code | Analysis, long context | Multimodal + search |