Evaluate threads
Step-by-step guide on how to evaluate conversation threads
When you are running multi-turn conversations using frameworks that support LLM agents, the Opik integration will automatically group related traces into conversation threads using parameters suitable for each framework.
This guide will walk you through the process of evaluating and optimizing conversation threads in Opik using
the evaluate_threads
function in the Python SDK.
Using the Python SDK
The Python SDK provides a simple and efficient way to evaluate and optimize conversation threads using the
evaluate_threads
function. This function allows you to specify a filter string to select specific threads for
evaluation, a list of metrics to apply to each thread, and it returns a ThreadsEvaluationResult
object
containing the evaluation results and feedback scores.
To run the threads evaluation, you can use the following code:
Using filter string
The evaluate_threads
function takes a filter string as an argument. This string is used to select the threads that
should be evaluated. For example, if you want to evaluate only threads that have a specific ID, you can use the
following filter string:
You can combine multiple filter strings using the AND
operator. For example, if you want to evaluate only threads
that have a specific ID and have a specific status, you can use the following filter string:
Supported filter fields and operators
The evaluate_threads
function supports the following filter fields in the filter_string
and
operators to be applied to the corresponding fields:
The feedback_scores
field is a dictionary where the keys are the metric names and the values are the metric values.
You can use it to filter threads based on their feedback scores. For example, if you want to evaluate only threads
that have a specific user frustration score, you can use the following filter string:
Where user_frustration_score
is the name of the user frustration metric and 0.5
is the threshold value to filter by.
Using Opik UI to view results
Once the evaluation is complete, you can access the evaluation results in the Opik UI.

Next steps
For more details on what metrics can be used to score conversational threads, refer to the conversational metrics page.