Production monitoring

Describes how to monitor your LLM applications in production using Opik

Opik has been designed from the ground up to support high volumes of traces making it the ideal tool for monitoring your production LLM applications.

You can use the Opik dashboard to review your feedback scores, trace count and tokens over time at both a daily and hourly granularity.

In addition to viewing scores over time, you can also view the average feedback scores for all the traces in your project from the traces table.

Logging feedback scores

To monitor the performance of your LLM application, you can log feedback scores using the Python SDK and through the UI.

Defining online evaluation metrics

You can define LLM as a Judge metrics in the Opik platform that will automatically score all, or a subset, of your production traces. You can find more information about how to define LLM as a Judge metrics in the Online evaluation section.

Once a rule is defined, Opik will score all the traces in the project and allow you to track these feedback scores over time.

In addition to allowing you to define LLM as a Judge metrics, Opik will soon allow you to define Python metrics to give you even more control over the feedback scores.

Manually logging feedback scores alongside traces

Feedback scores can be logged while you are logging traces:

1from opik import track, opik_context
2
3@track
4def llm_chain(input_text):
5 # LLM chain code
6 # ...
7
8 # Update the trace
9 opik_context.update_current_trace(
10 feedback_scores=[
11 {"name": "user_feedback", "value": 1.0, "reason": "The response was helpful and accurate."}
12 ]
13 )

Updating traces with feedback scores

You can also update traces with feedback scores after they have been logged. For this we are first going to fetch all the traces using the search API and then update the feedback scores for the traces we want to annotate.

Fetching traces using the search API

You can use the Opik.search_traces method to fetch all the traces you want to annotate.

1import opik
2
3opik_client = opik.Opik()
4
5traces = opik_client.search_traces(
6 project_name="Default Project"
7)

The search_traces method allows you to fetch traces based on any of trace attributes, you can learn more about the different search parameters in the search traces documentation.

Updating feedback scores

Once you have fetched the traces you want to annotate, you can update the feedback scores using the Opik.log_traces_feedback_scores method.

1for trace in traces:
2 opik_client.log_traces_feedback_scores(
3 project_name="Default Project",
4 feedback_scores=[{"id": trace.id, "name": "user_feedback", "value": 1.0, "reason": "The response was helpful and accurate."}],
5 )

You will now be able to see the feedback scores in the Opik dashboard and track the changes over time.

Updating trace content

Get trace content

You can view the content of your traces using Opik.get_trace_content(id: str), to look up your trace by id. Trace ids can be found using the Opik.search_traces() method or by looking at the ID column within the Projects > ‘My-project’ view.

1from opik import Opik
2
3TRACE_ID = 'EXAMPLE-ID' # UUIDv7 Identifier
4
5opik_client = Opik()
6trace_content = opik_client.get_trace_content(id = TRACE_ID)

This will return a TracePublic object, a pydantic model object with all the data associated with the trace found.

Update trace by ID

You can update a given trace by first re-instantiating the trace using opik.Opik.trace() and then updating any one of the trace attributes using Trace.update(). See above section for guidance on how to retrieve trace ids.

1from opik import Opik
2
3TRACE_ID = 'EXAMPLE-ID' # UUIDv7 Identifier
4
5opik_client = Opik()
6trace = opik_client.trace(id = TRACE_ID)
7trace.update(output = updated_output)

The trace attributes that can be used as parameters are as follows:

  • end_time: The end time of the trace.
  • metadata: Additional metadata to be associated with the trace.
  • input: The input data for the trace.
  • output: The output data for the trace.
  • tags: A list of tags to be associated with the trace.
  • error_info: The dictionary with error information (typically used when the trace function has failed).
  • thread_id: Used to group multiple traces into a thread. The identifier is user-defined and has to be unique per project.