How to beginner · 3 min read

How to evaluate agents with AgentOps

Quick answer
Use the agentops Python SDK to initialize AgentOps with your API key, then start and end evaluation sessions around your agent calls to track performance and metrics. This enables automatic observability and detailed evaluation of your AI agents' behavior.

PREREQUISITES

  • Python 3.8+
  • AgentOps API key
  • pip install agentops

Setup

Install the agentops package and set your API key as an environment variable to enable tracking.

bash
pip install agentops
output
Collecting agentops
  Downloading agentops-1.0.0-py3-none-any.whl (10 kB)
Installing collected packages: agentops
Successfully installed agentops-1.0.0

Step by step

Initialize AgentOps, start a session to evaluate your agent, run the agent logic, then end the session with a status. This tracks all LLM calls and logs metrics automatically.

python
import os
import agentops

# Initialize AgentOps with your API key
agentops.init(api_key=os.environ["AGENTOPS_API_KEY"])

# Start an evaluation session
session = agentops.start_session(tags=["agent-evaluation"])

# Simulate agent call
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is RAG?"}]
)
print("Agent response:", response.choices[0].message.content)

# End the session with success status
agentops.end_session("Success")
output
Agent response: Retrieval-Augmented Generation (RAG) is a technique that combines retrieval of documents with generative models to improve accuracy and relevance.

Common variations

You can use agentops with async agents by awaiting agentops.start_session() and agentops.end_session(). For streaming LLM calls, AgentOps automatically tracks partial outputs. You can also integrate AgentOps with other LLM providers by initializing their clients inside the session scope.

python
import asyncio
import os
import agentops
from openai import OpenAI

async def evaluate_agent_async():
    await agentops.init(api_key=os.environ["AGENTOPS_API_KEY"])
    session = await agentops.start_session(tags=["async-agent"])

    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Explain quantum computing."}]
    )
    print("Async agent response:", response.choices[0].message.content)

    await agentops.end_session("Success")

asyncio.run(evaluate_agent_async())
output
Async agent response: Quantum computing leverages quantum bits, or qubits, to perform complex computations much faster than classical computers for certain problems.

Troubleshooting

  • If you see no data in the AgentOps dashboard, verify your AGENTOPS_API_KEY environment variable is set correctly.
  • If sessions do not start or end properly, ensure you call agentops.start_session() and agentops.end_session() in the same runtime context.
  • For network errors, check your internet connection and firewall settings.

Key Takeaways

  • Initialize AgentOps with your API key to enable automatic agent evaluation tracking.
  • Wrap your agent calls between start_session and end_session to capture metrics and logs.
  • AgentOps supports both synchronous and asynchronous agent evaluation workflows.
  • Ensure environment variables are correctly set to avoid missing telemetry data.
  • AgentOps automatically tracks all LLM calls within sessions for comprehensive observability.
Verified 2026-04 · gpt-4o-mini
Verify ↗