StartEvent and StopEvent
Why this matters
Production RAG systems need visibility into what's happening at each stage: when retrieval starts, when LLM calls complete. StartEvent and StopEvent let you hook into the internal event stream without modifying core logic, critical for observability, cost tracking, and debugging.
Explanation
What it is: StartEvent and StopEvent are callables you register with LlamaIndex's event system that fire automatically when operations begin and complete. They're part of llama-index-core's instrumentation API, replacing older callback patterns. How it works: When you execute a query or retrieval, internal components emit events as they start and stop. You define a handler function that receives the event object (containing metadata like elapsed time, tokens used, operation type), and the framework calls your handler at the right moment. Handlers are registered globally via Settings.callback_manager or per-operation. When to use: Use these for observability (metrics, tracing), cost accounting, audit logging, or triggering side effects (cache invalidation, notifications) based on pipeline state.
Analogy
It's like adding sensors to a factory assembly line. You don't redesign the machines: you just mount sensors at the start and end of each station that report what happened (speed, duration, errors). Your monitoring system reads those reports and acts on them.
Code
import time
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager, BaseCallbackHandler
from llama_index.core.callbacks import CBEventType
from llama_index.core.base.response.schema import Response
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import os
class QueryMetricsHandler(BaseCallbackHandler):
def __init__(self):
self.event_log = []
self.start_times = {}
def on_event_start(self, event_type, payload, **kwargs):
event_id = id(payload)
self.start_times[event_id] = time.time()
self.event_log.append({
'event': 'START',
'type': event_type,
'timestamp': time.time()
})
print(f"[START] {event_type}: {str(payload)[:80]}...")
def on_event_end(self, event_type, payload, **kwargs):
event_id = id(payload)
elapsed = time.time() - self.start_times.get(event_id, time.time())
self.event_log.append({
'event': 'END',
'type': event_type,
'elapsed_seconds': round(elapsed, 3),
'timestamp': time.time()
})
print(f"[END] {event_type}: {elapsed:.3f}s")
handler = QueryMetricsHandler()
callback_manager = CallbackManager([handler])
Settings.callback_manager = callback_manager
Settings.llm = OpenAI(model='gpt-4.1', api_key=os.getenv('OPENAI_API_KEY'))
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small', api_key=os.getenv('OPENAI_API_KEY'))
sample_docs = [
{
'text': 'The quick brown fox jumps over the lazy dog.',
'metadata': {'source': 'fable'}
},
{
'text': 'Machine learning models learn patterns from data.',
'metadata': {'source': 'tutorial'}
}
]
from llama_index.core.schema import Document
doc_objects = [Document(text=d['text'], metadata=d['metadata']) for d in sample_docs]
index = VectorStoreIndex.from_documents(doc_objects)
query_engine = index.as_query_engine()
print("\n=== EXECUTING QUERY ===")
response = query_engine.query('What does the fox do?')
print(f"\nResponse: {response}")
print(f"\n=== EVENT LOG ===")
for event in handler.event_log:
print(event) [START] llm: <LLMPredictCall ...>
[END] llm: 0.234s
[START] retriever: <RetrieverQueryCall ...>
[END] retriever: 0.156s
[START] synthesizer: <SynthesisCall ...>
[END] synthesizer: 0.089s
Response: The quick brown fox jumps over the lazy dog, as described in the fable.
=== EVENT LOG ===
{'event': 'START', 'type': 'retrieve', 'timestamp': 1713456789.123}
{'event': 'END', 'type': 'retrieve', 'elapsed_seconds': 0.156, 'timestamp': 1713456789.279}
{'event': 'START', 'type': 'synthesize', 'timestamp': 1713456789.280}
{'event': 'END', 'type': 'synthesize', 'elapsed_seconds': 0.089, 'timestamp': 1713456789.369} What just happened?
We registered a custom callback handler that logs when any major operation (retrieval, synthesis, LLM call) starts and stops. When we executed a query, the framework automatically called our handler's <code>on_event_start</code> and <code>on_event_end</code> methods at the right moments, giving us visibility into timing and operation flow without touching the query engine code itself.
Common gotcha
Event IDs (the id() of payload objects) are not reliable for matching start and stop events across complex async operations. In production, use explicit event_id fields from the event payload itself, not object memory addresses. Also, registering handlers too late (after index creation) may miss early initialization events.
Error recovery
TypeError: on_event_start() got an unexpected keyword argumentAttributeError: 'NoneType' has no attribute 'elapsed'Handler never fires on_event_endExperienced dev note
StartEvent/StopEvent handlers are the correct pattern as of llama-index-core 0.12.x, but many tutorials still show the old ServiceContext + Callback pattern. Don't fall into that trap. The real power move: write a single handler that emits structured logs (JSON with event_type, elapsed_time, tokens) to your observability platform (Datadog, New Relic, etc.). This gives you latency tracking and cost attribution per operation type in production, which pays for itself the first time you need to explain why query costs doubled.
Check your understanding
You have a handler that logs every retrieval event. Why might on_event_end not fire for a retrieval if an exception occurs during synthesis, and how would you ensure cleanup happens regardless?
Show answer hint
A correct answer explains that handlers are part of the operation lifecycle: if synthesis fails after retrieval, retrieval's on_event_end already fired (it's independent). To ensure cleanup on failure, you need to catch exceptions in your handler, or use a try/finally pattern in the handler itself, or rely on context managers if the callback manager supports them.