How to beginner · 3 min read

Cerebras pricing

Quick answer
Cerebras pricing is not publicly detailed on their website; you must contact Cerebras sales for custom quotes. To use the Cerebras API, you typically pay based on usage, and pricing varies by model and deployment scale.

PREREQUISITES

  • Python 3.8+
  • Cerebras API key
  • pip install openai>=1.0

Setup

Install the openai Python package to access Cerebras API via OpenAI-compatible calls. Set your Cerebras API key as an environment variable.

  • Install package: pip install openai
  • Set environment variable: export CEREBRAS_API_KEY='your_api_key' (Linux/macOS) or set CEREBRAS_API_KEY=your_api_key (Windows)
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the openai SDK with the Cerebras base URL to make chat completions calls. Pricing depends on usage and model; contact Cerebras for exact rates.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["CEREBRAS_API_KEY"], base_url="https://api.cerebras.ai/v1")

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[{"role": "user", "content": "Hello, what is Cerebras pricing?"}]
)

print(response.choices[0].message.content)
output
Cerebras pricing is customized based on your usage and deployment. Please contact sales@cerebras.net for detailed pricing information.

Common variations

You can use different Cerebras models like llama3.1-8b or llama3.3-70b. Async calls and streaming are supported via the openai SDK. Pricing varies by model size and usage volume.

python
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["CEREBRAS_API_KEY"], base_url="https://api.cerebras.ai/v1")
    stream = await client.chat.completions.acreate(
        model="llama3.1-8b",
        messages=[{"role": "user", "content": "Tell me about Cerebras pricing."}],
        stream=True
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_chat())
output
Cerebras pricing depends on your usage and model choice. Contact sales@cerebras.net for details.

Troubleshooting

If you receive authentication errors, verify your CEREBRAS_API_KEY environment variable is set correctly. For connection issues, check your network and the base_url. If pricing or billing questions arise, contact Cerebras sales directly.

Key Takeaways

  • Cerebras pricing is custom and requires contacting their sales team for exact details.
  • Use the OpenAI SDK with base_url="https://api.cerebras.ai/v1" to access Cerebras models.
  • Pricing varies by model size and usage volume; monitor usage to manage costs.
  • Set your API key in the environment variable CEREBRAS_API_KEY before making requests.
Verified 2026-04 · llama3.3-70b, llama3.1-8b
Verify ↗