How to beginner · 3 min read

How to evaluate agents with AgentOps

Q: How to evaluate agents with AgentOps

Use the agentops Python SDK to initialize AgentOps with your API key, then start and end evaluation sessions around your agent calls to track performance and metrics. This enables automatic observability and detailed evaluation of your AI agents' behavior.

Quick answer

Use the agentops Python SDK to initialize AgentOps with your API key, then start and end evaluation sessions around your agent calls to track performance and metrics. This enables automatic observability and detailed evaluation of your AI agents' behavior.

PREREQUISITES

Python 3.8+
AgentOps API key
pip install agentops

Setup

Install the agentops package and set your API key as an environment variable to enable tracking.

bash

pip install agentops

output

Collecting agentops
  Downloading agentops-1.0.0-py3-none-any.whl (10 kB)
Installing collected packages: agentops
Successfully installed agentops-1.0.0

Step by step

Initialize AgentOps, start a session to evaluate your agent, run the agent logic, then end the session with a status. This tracks all LLM calls and logs metrics automatically.

python

import os
import agentops

# Initialize AgentOps with your API key
agentops.init(api_key=os.environ["AGENTOPS_API_KEY"])

# Start an evaluation session
session = agentops.start_session(tags=["agent-evaluation"])

# Simulate agent call
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is RAG?"}]
)
print("Agent response:", response.choices[0].message.content)

# End the session with success status
agentops.end_session("Success")

output

Agent response: Retrieval-Augmented Generation (RAG) is a technique that combines retrieval of documents with generative models to improve accuracy and relevance.

Common variations

You can use agentops with async agents by awaiting agentops.start_session() and agentops.end_session(). For streaming LLM calls, AgentOps automatically tracks partial outputs. You can also integrate AgentOps with other LLM providers by initializing their clients inside the session scope.

python

import asyncio
import os
import agentops
from openai import OpenAI

async def evaluate_agent_async():
    await agentops.init(api_key=os.environ["AGENTOPS_API_KEY"])
    session = await agentops.start_session(tags=["async-agent"])

    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Explain quantum computing."}]
    )
    print("Async agent response:", response.choices[0].message.content)

    await agentops.end_session("Success")

asyncio.run(evaluate_agent_async())

output

Async agent response: Quantum computing leverages quantum bits, or qubits, to perform complex computations much faster than classical computers for certain problems.

Troubleshooting

If you see no data in the AgentOps dashboard, verify your AGENTOPS_API_KEY environment variable is set correctly.
If sessions do not start or end properly, ensure you call agentops.start_session() and agentops.end_session() in the same runtime context.
For network errors, check your internet connection and firewall settings.

✅

Key Takeaways

Initialize AgentOps with your API key to enable automatic agent evaluation tracking.
Wrap your agent calls between start_session and end_session to capture metrics and logs.
AgentOps supports both synchronous and asynchronous agent evaluation workflows.
Ensure environment variables are correctly set to avoid missing telemetry data.
AgentOps automatically tracks all LLM calls within sessions for comprehensive observability.

Verified 2026-04 · gpt-4o-mini

Verify ↗