Observability for LlamaIndex with Opik

LlamaIndex is a flexible data framework for building LLM applications:

LlamaIndex is a “data framework” to help you build LLM apps. It provides the following tools:

Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).

Account Setup

Comet provides a hosted version of the Opik platform, simply create an account and grab your API Key.

You can also run the Opik platform locally, see the installation guide for more information.

Getting Started

Installation

To use the Opik integration with LlamaIndex, you’ll need to have both the opik and llama_index packages installed. You can install them using pip:

$ pip install opik llama-index llama-index-agent-openai llama-index-llms-openai llama-index-callbacks-opik

Configuring Opik

Configure the Opik Python SDK for your deployment type. See the Python SDK Configuration guide for detailed instructions on:

CLI configuration: opik configure
Code configuration: opik.configure()
Self-hosted vs Cloud vs Enterprise setup
Configuration files and environment variables

Configuring LlamaIndex

In order to use LlamaIndex, you will need to configure your LLM provider API keys. For this example, we’ll use OpenAI. You can find or create your API keys in these pages:

You can set them as environment variables:

$ export OPENAI_API_KEY="YOUR_API_KEY"

Or set them programmatically:

1 import os
2 import getpass
3 
4 if "OPENAI_API_KEY" not in os.environ:
5     os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

Using the Opik integration

To use the Opik integration with LLamaIndex, you can use the set_global_handler function from the LlamaIndex package to set the global tracer:

1 from llama_index.core import global_handler, set_global_handler
2 
3 set_global_handler("opik")
4 opik_callback_handler = global_handler

Now that the integration is set up, all the LlamaIndex runs will be traced and logged to Opik.

Alternatively, you can configure the callback handler directly for more control:

1 from llama_index.core import Settings
2 from llama_index.core.callbacks import CallbackManager
3 from opik.integrations.llama_index import LlamaIndexCallbackHandler
4 
5 # Basic setup
6 opik_callback = LlamaIndexCallbackHandler()
7 
8 # Or with optional parameters
9 opik_callback = LlamaIndexCallbackHandler(
10     project_name="my-llamaindex-project",  # Set custom project name
11     skip_index_construction_trace=True     # Skip tracking index construction
12 )
13 
14 Settings.callback_manager = CallbackManager([opik_callback])

The skip_index_construction_trace parameter is useful when you want to track only query operations and not the index construction phase (particularly for large document sets or pre-built indexes)

Example

To showcase the integration, we will create a new a query engine that will use Paul Graham’s essays as the data source.

First step: Configure the Opik integration:

1 import os
2 from llama_index.core import global_handler, set_global_handler
3 
4 # Set project name for better organization
5 os.environ["OPIK_PROJECT_NAME"] = "llamaindex-integration-demo"
6 
7 set_global_handler("opik")
8 opik_callback_handler = global_handler

Second step: Download the example data:

1 import os
2 import requests
3 
4 # Create directory if it doesn't exist
5 os.makedirs('./data/paul_graham/', exist_ok=True)
6 
7 # Download the file using requests
8 url = 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt'
9 response = requests.get(url)
10 with open('./data/paul_graham/paul_graham_essay.txt', 'wb') as f:
11     f.write(response.content)

Third step:

Configure the OpenAI API key:

1 import os
2 import getpass
3 
4 if "OPENAI_API_KEY" not in os.environ:
5     os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

Fourth step:

We can now load the data, create an index and query engine:

1 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
2 
3 documents = SimpleDirectoryReader("./data/paul_graham").load_data()
4 index = VectorStoreIndex.from_documents(documents)
5 query_engine = index.as_query_engine()
6 
7 response = query_engine.query("What did the author do growing up?")
8 print(response)

Given that the integration with Opik has been set up, all the traces are logged to the Opik platform:

Using with the @track Decorator

The LlamaIndex integration seamlessly works with Opik’s @track decorator. When you call LlamaIndex operations inside a tracked function, the LlamaIndex traces will automatically be attached as child spans to your existing trace.

1 import opik
2 from llama_index.core import global_handler, set_global_handler
3 from llama_index.llms.openai import OpenAI
4 from llama_index.core.llms import ChatMessage
5 
6 # Configure Opik integration
7 set_global_handler("opik")
8 opik_callback_handler = global_handler
9 
10 @opik.track()
11 def my_llm_application(user_query: str):
12     """Process user query with LlamaIndex"""
13     llm = OpenAI(model="gpt-3.5-turbo")
14     messages = [
15         ChatMessage(role="system", content="You are a helpful assistant."),
16         ChatMessage(role="user", content=user_query),
17     ]
18     
19     response = llm.chat(messages)
20     return response.message.content
21 
22 # Call the tracked function
23 result = my_llm_application("What is the capital of France?")
24 print(result)

In this example, Opik will create a trace for the my_llm_application function, and all LlamaIndex operations (like the LLM chat call) will appear as nested spans within this trace, giving you a complete view of your application’s execution.

Using with Manual Trace Creation

You can also manually create traces using opik.start_as_current_trace() and have LlamaIndex operations nested within:

1 import opik
2 from llama_index.core import global_handler, set_global_handler
3 from llama_index.llms.openai import OpenAI
4 from llama_index.core.llms import ChatMessage
5 
6 # Configure Opik integration
7 set_global_handler("opik")
8 opik_callback_handler = global_handler
9 
10 # Create a manual trace
11 with opik.start_as_current_trace(name="user_query_processing"):
12     llm = OpenAI(model="gpt-3.5-turbo")
13     messages = [
14         ChatMessage(role="user", content="Explain quantum computing in simple terms"),
15     ]
16     
17     response = llm.chat(messages)
18     print(response.message.content)

This approach is useful when you want more control over trace naming and want to group multiple LlamaIndex operations under a single trace.

Tracking LlamaIndex Workflows

LlamaIndex workflows are multi-step processing pipelines for LLM applications. To track workflow executions in Opik, you can manually decorate your workflow steps and use opik.start_as_current_span() to wrap the workflow execution.

Basic Workflow Tracking

You can use @opik.track() to decorate your workflow steps and opik.start_as_current_span() to track the workflow execution:

1 import opik
2 from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step, Event
3 from llama_index.core import Settings
4 from llama_index.core.callbacks import CallbackManager
5 from llama_index.core import global_handler, set_global_handler
6 
7 # Configure Opik integration for LLM calls within steps
8 set_global_handler("opik")
9 
10 class QueryEvent(Event):
11     """Event for passing query through workflow."""
12     query: str
13 
14 class MyRAGWorkflow(Workflow):
15     """Simple RAG workflow with tracked steps."""
16     
17     @step
18     @opik.track()
19     async def retrieve_context(self, ev: StartEvent) -> QueryEvent:
20         """Retrieve relevant context for the query."""
21         query = ev.get("query", "")
22         # Your retrieval logic here
23         context = f"Context for: {query}"
24         return QueryEvent(query=f"{context} | {query}")
25     
26     @step
27     @opik.track()
28     async def generate_response(self, ev: QueryEvent) -> StopEvent:
29         """Generate final response using the context."""
30         # Your generation logic here
31         result = f"Response based on: {ev.query}"
32         return StopEvent(result=result)
33 
34 # Create workflow instance
35 workflow = MyRAGWorkflow()
36 
37 # Use start_as_current_span to track workflow execution
38 with opik.start_as_current_span(
39     name="rag_workflow_execution",
40     input={"query": "What are the key features?"},
41     project_name="llama-index-workflows"
42 ) as span:
43     result = await workflow.run(query="What are the key features?")
44     span.update(output={"result": result})
45     
46 print(result)
47 opik.flush_tracker()  # Ensure all traces are sent

In this example:

Each workflow step is decorated with @opik.track() to create spans
The @step decorator is placed before @opik.track() to ensure LlamaIndex can properly discover the workflow steps
opik.start_as_current_span() tracks the overall workflow execution
LLM calls within steps are automatically tracked via the global Opik handler
All workflow steps appear as nested spans within the workflow trace

If you’re certain the workflow is a top-level call and want to create only a trace without an additional span, you can use opik.start_as_current_trace() instead of opik.start_as_current_span(). However, start_as_current_span() is more flexible as it works in both standalone and nested contexts.

Best Practices

Decorate all workflow steps with @opik.track() to capture each step as a span
Decorator order matters: Place @step before @opik.track() so LlamaIndex’s workflow engine can properly discover and execute steps
Use opik.start_as_current_span() to wrap workflow execution - it works in both standalone and nested contexts
Configure the global handler to automatically track LLM calls within steps
Use descriptive names for spans to make debugging easier
Always call opik.flush_tracker() at the end to ensure all traces are sent
Include input/output in span updates for better debugging

Token Usage in Streaming Responses

When using streaming chat responses with OpenAI models (e.g., llm.stream_chat()), you need to explicitly enable token usage tracking by configuring the stream_options parameter:

1 from llama_index.llms.openai import OpenAI
2 from llama_index.core.llms import ChatMessage
3 from llama_index.core import global_handler, set_global_handler
4 
5 # Configure Opik integration
6 set_global_handler("opik")
7 
8 # Configure OpenAI LLM with stream_options to include usage information
9 llm = OpenAI(
10     model="gpt-3.5-turbo",
11     additional_kwargs={
12         "stream_options": {"include_usage": True}
13     }
14 )
15 
16 messages = [
17     ChatMessage(role="user", content="Tell me a short joke")
18 ]
19 
20 # Token usage will now be tracked in streaming responses
21 response = llm.stream_chat(messages)
22 for chunk in response:
23     print(chunk.delta, end="", flush=True)

Without setting stream_options={'include_usage': True}, streaming responses from OpenAI models will not include token usage information in Opik traces. This is a requirement of OpenAI’s streaming API.

Cost Tracking

The Opik integration with LlamaIndex automatically tracks token usage and cost for all supported LLM models used within LlamaIndex applications.

Cost information is automatically captured and displayed in the Opik UI, including:

Token usage details
Cost per request based on model pricing
Total trace cost

View the complete list of supported models and providers on the Supported Models page.