How to beginner · 3 min read

How to log and analyze LLM outputs

Q: How to log and analyze LLM outputs

Use the chat.completions.create method from the OpenAI SDK to capture LLM outputs programmatically. Log these outputs to files or databases, then analyze them with Python tools like pandas or visualization libraries to identify patterns, errors, or biases.

Quick answer

Use the chat.completions.create method from the OpenAI SDK to capture LLM outputs programmatically. Log these outputs to files or databases, then analyze them with Python tools like pandas or visualization libraries to identify patterns, errors, or biases.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install pandas matplotlib

Setup

Install the required Python packages and set your environment variable for the OpenAI API key.

Install OpenAI SDK and analysis libraries:

bash

pip install openai pandas matplotlib

Step by step

This example shows how to call an LLM, log the output to a CSV file, and then analyze the logged data with pandas and matplotlib.

python

import os
import csv
from openai import OpenAI
import pandas as pd
import matplotlib.pyplot as plt

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define prompt and call model
messages = [{"role": "user", "content": "Explain the benefits of logging LLM outputs."}]
response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)

# Extract text output
output_text = response.choices[0].message.content
print("LLM output:", output_text)

# Log output to CSV file
log_file = "llm_outputs.csv"
with open(log_file, mode="a", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow([messages[0]["content"], output_text])

# Analyze logged outputs
# Load CSV into pandas DataFrame
try:
    df = pd.read_csv(log_file, header=None, names=["prompt", "response"] )
    print(f"Logged {len(df)} entries.")

    # Simple analysis: response length distribution
    df["response_length"] = df["response"].apply(len)
    df["response_length"].hist(bins=10)
    plt.title("Distribution of LLM response lengths")
    plt.xlabel("Response length (characters)")
    plt.ylabel("Frequency")
    plt.show()
except FileNotFoundError:
    print("No log file found for analysis.")

output

LLM output: Logging LLM outputs helps track model behavior, debug issues, and improve performance.
Logged 1 entries.

Common variations

You can adapt logging for asynchronous calls, streaming outputs, or different models like claude-3-5-sonnet-20241022. For example, use the Anthropic SDK for Claude models or add timestamps and metadata to logs for richer analysis.

python

import os
from anthropic import Anthropic
import csv

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

messages = [{"role": "user", "content": "Explain the benefits of logging LLM outputs."}]

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    system="You are a helpful assistant.",
    messages=messages
)

output_text = message.content
print("Claude output:", output_text)

# Append to CSV log
log_file = "claude_llm_outputs.csv"
with open(log_file, mode="a", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow([messages[0]["content"], output_text])

output

Claude output: Logging outputs from LLMs enables better debugging, auditing, and model improvement.

Troubleshooting

If you see empty or missing outputs, verify your API key and model name.
For encoding errors when writing logs, ensure your file uses UTF-8 encoding.
If logs grow too large, consider rotating files or using a database for storage.

✅

Key Takeaways

Always log both prompts and LLM responses for full context during analysis.
Use structured formats like CSV or JSON for easy parsing and querying.
Analyze logs with Python libraries such as pandas and matplotlib to identify trends and issues.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗