How to beginner · 3 min read

Cerebras for AI agents

Quick answer

Use the OpenAI SDK with Cerebras' OpenAI-compatible API by setting the base_url to https://api.cerebras.ai/v1. Instantiate the client with your CEREBRAS_API_KEY and call chat.completions.create to build AI agents leveraging Cerebras models like llama3.3-70b.

PREREQUISITES

Python 3.8+
Cerebras API key (set as CEREBRAS_API_KEY environment variable)
pip install openai>=1.0

Setup

Install the official openai Python package and set your Cerebras API key as an environment variable.

Install package: pip install openai
Set environment variable: export CEREBRAS_API_KEY='your_api_key' (Linux/macOS) or set CEREBRAS_API_KEY=your_api_key (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK with Cerebras by specifying the base_url. This example sends a chat completion request to the llama3.3-70b model to create a simple AI agent response.

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1"
)

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[{"role": "user", "content": "You are an AI agent. Respond to: What is RAG?"}]
)

print("Agent response:", response.choices[0].message.content)

output

Agent response: Retrieval-Augmented Generation (RAG) is a technique that combines retrieval of relevant documents with generative models to produce accurate and context-aware responses.

Common variations

You can use different Cerebras models like llama3.1-8b by changing the model parameter. For asynchronous calls, use Python async with the OpenAI client. Streaming responses are supported by setting stream=True in the request.

python

import asyncio
import os
from openai import OpenAI

async def async_agent():
    client = OpenAI(
        api_key=os.environ["CEREBRAS_API_KEY"],
        base_url="https://api.cerebras.ai/v1"
    )

    stream = await client.chat.completions.acreate(
        model="llama3.1-8b",
        messages=[{"role": "user", "content": "Explain AI agents."}],
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

asyncio.run(async_agent())

output

AI agents are software entities that use artificial intelligence techniques to autonomously perform tasks, make decisions, and interact with environments or users.

Troubleshooting

If you get authentication errors, verify your CEREBRAS_API_KEY environment variable is set correctly.
For connection issues, ensure your network allows HTTPS requests to https://api.cerebras.ai/v1.
If the model is not found, confirm you are using a valid Cerebras model name like llama3.3-70b.

✅

Key Takeaways

Use the OpenAI SDK with base_url set to Cerebras API endpoint for AI agent integration.
Specify Cerebras models like llama3.3-70b to leverage powerful LLMs for agent tasks.
Support for async and streaming enables responsive AI agent implementations.
Always set your Cerebras API key in the environment variable CEREBRAS_API_KEY.
Check model names and network connectivity to avoid common errors.

Verified 2026-04 · llama3.3-70b, llama3.1-8b

Verify ↗