How to beginner · 3 min read

Cerebras for AI agents

Quick answer
Use the OpenAI SDK with Cerebras' OpenAI-compatible API by setting the base_url to https://api.cerebras.ai/v1. Instantiate the client with your CEREBRAS_API_KEY and call chat.completions.create to build AI agents leveraging Cerebras models like llama3.3-70b.

PREREQUISITES

  • Python 3.8+
  • Cerebras API key (set as CEREBRAS_API_KEY environment variable)
  • pip install openai>=1.0

Setup

Install the official openai Python package and set your Cerebras API key as an environment variable.

  • Install package: pip install openai
  • Set environment variable: export CEREBRAS_API_KEY='your_api_key' (Linux/macOS) or set CEREBRAS_API_KEY=your_api_key (Windows)
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK with Cerebras by specifying the base_url. This example sends a chat completion request to the llama3.3-70b model to create a simple AI agent response.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1"
)

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[{"role": "user", "content": "You are an AI agent. Respond to: What is RAG?"}]
)

print("Agent response:", response.choices[0].message.content)
output
Agent response: Retrieval-Augmented Generation (RAG) is a technique that combines retrieval of relevant documents with generative models to produce accurate and context-aware responses.

Common variations

You can use different Cerebras models like llama3.1-8b by changing the model parameter. For asynchronous calls, use Python async with the OpenAI client. Streaming responses are supported by setting stream=True in the request.

python
import asyncio
import os
from openai import OpenAI

async def async_agent():
    client = OpenAI(
        api_key=os.environ["CEREBRAS_API_KEY"],
        base_url="https://api.cerebras.ai/v1"
    )

    stream = await client.chat.completions.acreate(
        model="llama3.1-8b",
        messages=[{"role": "user", "content": "Explain AI agents."}],
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

asyncio.run(async_agent())
output
AI agents are software entities that use artificial intelligence techniques to autonomously perform tasks, make decisions, and interact with environments or users.

Troubleshooting

  • If you get authentication errors, verify your CEREBRAS_API_KEY environment variable is set correctly.
  • For connection issues, ensure your network allows HTTPS requests to https://api.cerebras.ai/v1.
  • If the model is not found, confirm you are using a valid Cerebras model name like llama3.3-70b.

Key Takeaways

  • Use the OpenAI SDK with base_url set to Cerebras API endpoint for AI agent integration.
  • Specify Cerebras models like llama3.3-70b to leverage powerful LLMs for agent tasks.
  • Support for async and streaming enables responsive AI agent implementations.
  • Always set your Cerebras API key in the environment variable CEREBRAS_API_KEY.
  • Check model names and network connectivity to avoid common errors.
Verified 2026-04 · llama3.3-70b, llama3.1-8b
Verify ↗