How to beginner · 3 min read

Cerebras supported models

Q: Cerebras supported models

Cerebras supports models such as llama3.3-70b and llama3.1-8b via its OpenAI-compatible API. Use the OpenAI Python SDK with the base_url set to https://api.cerebras.ai/v1 and specify these model names in your requests.

Quick answer

Cerebras supports models such as llama3.3-70b and llama3.1-8b via its OpenAI-compatible API. Use the OpenAI Python SDK with the base_url set to https://api.cerebras.ai/v1 and specify these model names in your requests.

PREREQUISITES

Python 3.8+
CEREBRAS_API_KEY environment variable set
pip install openai>=1.0

Setup

Install the official openai Python package (v1+) and set your Cerebras API key as an environment variable.

Run pip install openai to install the SDK.
Set CEREBRAS_API_KEY in your shell environment.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI client with the Cerebras API endpoint and call supported models like llama3.3-70b. Below is a complete example to send a chat completion request.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["CEREBRAS_API_KEY"], base_url="https://api.cerebras.ai/v1")

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[{"role": "user", "content": "Hello, Cerebras!"}]
)

print(response.choices[0].message.content)

output

Hello, Cerebras! How can I assist you today?

Common variations

You can switch to the smaller model llama3.1-8b by changing the model parameter. The Cerebras API is fully OpenAI-compatible, so you can use streaming or async calls with the same client pattern.

python

import asyncio
import os
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["CEREBRAS_API_KEY"], base_url="https://api.cerebras.ai/v1")
    
    # Async streaming example
    stream = await client.chat.completions.acreate(
        model="llama3.1-8b",
        messages=[{"role": "user", "content": "Stream a greeting from Cerebras."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

output

Hello from Cerebras streaming model llama3.1-8b!

Troubleshooting

If you receive authentication errors, verify your CEREBRAS_API_KEY is correctly set in your environment. For model not found errors, confirm you are using a supported model name like llama3.3-70b or llama3.1-8b. Network issues may require checking your firewall or proxy settings.

✅

Key Takeaways

Use llama3.3-70b and llama3.1-8b models with Cerebras via OpenAI-compatible API.
Set base_url="https://api.cerebras.ai/v1" in the OpenAI client to target Cerebras endpoints.
The Cerebras API supports streaming and async calls identical to OpenAI SDK patterns.
Always keep your CEREBRAS_API_KEY secure and set in your environment variables.
Model availability and API details may change; verify at https://docs.cerebras.net/api.

Verified 2026-04 · llama3.3-70b, llama3.1-8b

Verify ↗