How to beginner · 3 min read

Cerebras pricing

Quick answer

Cerebras pricing is not publicly detailed on their website; you must contact Cerebras sales for custom quotes. To use the Cerebras API, you typically pay based on usage, and pricing varies by model and deployment scale.

PREREQUISITES

Python 3.8+
Cerebras API key
pip install openai>=1.0

Setup

Install the openai Python package to access Cerebras API via OpenAI-compatible calls. Set your Cerebras API key as an environment variable.

Install package: pip install openai
Set environment variable: export CEREBRAS_API_KEY='your_api_key' (Linux/macOS) or set CEREBRAS_API_KEY=your_api_key (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the openai SDK with the Cerebras base URL to make chat completions calls. Pricing depends on usage and model; contact Cerebras for exact rates.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["CEREBRAS_API_KEY"], base_url="https://api.cerebras.ai/v1")

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[{"role": "user", "content": "Hello, what is Cerebras pricing?"}]
)

print(response.choices[0].message.content)

output

Cerebras pricing is customized based on your usage and deployment. Please contact sales@cerebras.net for detailed pricing information.

Common variations

You can use different Cerebras models like llama3.1-8b or llama3.3-70b. Async calls and streaming are supported via the openai SDK. Pricing varies by model size and usage volume.

python

import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["CEREBRAS_API_KEY"], base_url="https://api.cerebras.ai/v1")
    stream = await client.chat.completions.acreate(
        model="llama3.1-8b",
        messages=[{"role": "user", "content": "Tell me about Cerebras pricing."}],
        stream=True
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_chat())

output

Cerebras pricing depends on your usage and model choice. Contact sales@cerebras.net for details.

Troubleshooting

If you receive authentication errors, verify your CEREBRAS_API_KEY environment variable is set correctly. For connection issues, check your network and the base_url. If pricing or billing questions arise, contact Cerebras sales directly.

✅

Key Takeaways

Cerebras pricing is custom and requires contacting their sales team for exact details.
Use the OpenAI SDK with base_url="https://api.cerebras.ai/v1" to access Cerebras models.
Pricing varies by model size and usage volume; monitor usage to manage costs.
Set your API key in the environment variable CEREBRAS_API_KEY before making requests.

Verified 2026-04 · llama3.3-70b, llama3.1-8b

Verify ↗