Code Advanced medium · 7 min

StartEvent and StopEvent

What you will learn

Instrument your RAG pipeline with lifecycle hooks that fire when operations begin and end, enabling custom logging, monitoring, and side effects.

Why this matters

Production RAG systems need visibility into what's happening at each stage: when retrieval starts, when LLM calls complete. StartEvent and StopEvent let you hook into the internal event stream without modifying core logic, critical for observability, cost tracking, and debugging.

Skip if: Don't use event handlers for business logic that should be in the retriever or query engine itself. Don't add handlers for every single event unless you're specifically debugging: they add overhead and noise to logs.

Explanation

What it is: StartEvent and StopEvent are callables you register with LlamaIndex's event system that fire automatically when operations begin and complete. They're part of llama-index-core's instrumentation API, replacing older callback patterns. How it works: When you execute a query or retrieval, internal components emit events as they start and stop. You define a handler function that receives the event object (containing metadata like elapsed time, tokens used, operation type), and the framework calls your handler at the right moment. Handlers are registered globally via Settings.callback_manager or per-operation. When to use: Use these for observability (metrics, tracing), cost accounting, audit logging, or triggering side effects (cache invalidation, notifications) based on pipeline state.

Analogy

It's like adding sensors to a factory assembly line. You don't redesign the machines: you just mount sensors at the start and end of each station that report what happened (speed, duration, errors). Your monitoring system reads those reports and acts on them.

Code

Illustrative only - not runnable without a valid API key

python

import time
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager, BaseCallbackHandler
from llama_index.core.callbacks import CBEventType
from llama_index.core.base.response.schema import Response
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import os

class QueryMetricsHandler(BaseCallbackHandler):
    def __init__(self):
        self.event_log = []
        self.start_times = {}

    def on_event_start(self, event_type, payload, **kwargs):
        event_id = id(payload)
        self.start_times[event_id] = time.time()
        self.event_log.append({
            'event': 'START',
            'type': event_type,
            'timestamp': time.time()
        })
        print(f"[START] {event_type}: {str(payload)[:80]}...")

    def on_event_end(self, event_type, payload, **kwargs):
        event_id = id(payload)
        elapsed = time.time() - self.start_times.get(event_id, time.time())
        self.event_log.append({
            'event': 'END',
            'type': event_type,
            'elapsed_seconds': round(elapsed, 3),
            'timestamp': time.time()
        })
        print(f"[END] {event_type}: {elapsed:.3f}s")

handler = QueryMetricsHandler()
callback_manager = CallbackManager([handler])
Settings.callback_manager = callback_manager
Settings.llm = OpenAI(model='gpt-4.1', api_key=os.getenv('OPENAI_API_KEY'))
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small', api_key=os.getenv('OPENAI_API_KEY'))

sample_docs = [
    {
        'text': 'The quick brown fox jumps over the lazy dog.',
        'metadata': {'source': 'fable'}
    },
    {
        'text': 'Machine learning models learn patterns from data.',
        'metadata': {'source': 'tutorial'}
    }
]

from llama_index.core.schema import Document
doc_objects = [Document(text=d['text'], metadata=d['metadata']) for d in sample_docs]
index = VectorStoreIndex.from_documents(doc_objects)

query_engine = index.as_query_engine()
print("\n=== EXECUTING QUERY ===")
response = query_engine.query('What does the fox do?')
print(f"\nResponse: {response}")
print(f"\n=== EVENT LOG ===")
for event in handler.event_log:
    print(event)

Output

[START] llm: <LLMPredictCall ...>
[END] llm: 0.234s
[START] retriever: <RetrieverQueryCall ...>
[END] retriever: 0.156s
[START] synthesizer: <SynthesisCall ...>
[END] synthesizer: 0.089s

Response: The quick brown fox jumps over the lazy dog, as described in the fable.

=== EVENT LOG ===
{'event': 'START', 'type': 'retrieve', 'timestamp': 1713456789.123}
{'event': 'END', 'type': 'retrieve', 'elapsed_seconds': 0.156, 'timestamp': 1713456789.279}
{'event': 'START', 'type': 'synthesize', 'timestamp': 1713456789.280}
{'event': 'END', 'type': 'synthesize', 'elapsed_seconds': 0.089, 'timestamp': 1713456789.369}

What just happened?

We registered a custom callback handler that logs when any major operation (retrieval, synthesis, LLM call) starts and stops. When we executed a query, the framework automatically called our handler's <code>on_event_start</code> and <code>on_event_end</code> methods at the right moments, giving us visibility into timing and operation flow without touching the query engine code itself.

Common gotcha

Event IDs (the id() of payload objects) are not reliable for matching start and stop events across complex async operations. In production, use explicit event_id fields from the event payload itself, not object memory addresses. Also, registering handlers too late (after index creation) may miss early initialization events.

Error recovery

TypeError: on_event_start() got an unexpected keyword argument

You're using an old callback signature. Use BaseCallbackHandler from llama_index.core.callbacks and implement on_event_start(self, event_type, payload, **kwargs) with **kwargs to absorb unknown fields.

AttributeError: 'NoneType' has no attribute 'elapsed'

The payload structure varies by event type. Don't assume elapsed_seconds exists on every event: check the event_type first and only access fields that exist for that type.

Handler never fires on_event_end

If you're using async query execution, handlers may not fire synchronously. Use query_engine.aquery() and await results, or set a synchronous mode explicitly in the callback manager.

Experienced dev note

StartEvent/StopEvent handlers are the correct pattern as of llama-index-core 0.12.x, but many tutorials still show the old ServiceContext + Callback pattern. Don't fall into that trap. The real power move: write a single handler that emits structured logs (JSON with event_type, elapsed_time, tokens) to your observability platform (Datadog, New Relic, etc.). This gives you latency tracking and cost attribution per operation type in production, which pays for itself the first time you need to explain why query costs doubled.

Check your understanding

You have a handler that logs every retrieval event. Why might on_event_end not fire for a retrieval if an exception occurs during synthesis, and how would you ensure cleanup happens regardless?

Show answer hint

A correct answer explains that handlers are part of the operation lifecycle: if synthesis fails after retrieval, retrieval's on_event_end already fired (it's independent). To ensure cleanup on failure, you need to catch exceptions in your handler, or use a try/finally pattern in the handler itself, or rely on context managers if the callback manager supports them.

VERSION CallbackManager and event handlers were restructured in llama-index-core 0.10.0+. The old 'callback' parameter on individual operations was deprecated. Always use Settings.callback_manager for global registration in 0.12.x. Direct operation-level callbacks still work but are phase-out candidates.

Next, explore <strong>instrumentation with observability platforms</strong>: integrating StartEvent/StopEvent handlers with OpenTelemetry or Datadog for production-grade tracing and latency metrics across your RAG pipeline.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.