Multi-Metric Optimization | Opik Documentation

When optimizing AI agents, you often need to balance multiple quality dimensions simultaneously. Multi-metric optimization allows you to combine several evaluation metrics with customizable weights to create a composite objective function.

Why Use Multi-Metric Optimization?

While you can implement metric combinations within a custom metric function, using Opik Optimizer’s MultiMetricObjective API provides additional benefits:

Automatic logging of all component metrics to the Opik platform
Individual tracking of each sub-metric alongside the composite score
Detailed visibility into how each metric contributes to optimization
Trial-level insights for both aggregate and individual trace performance

This visibility helps you understand trade-offs between different quality dimensions during optimization.

Quickstart

You can use MultiMetricObjective to create a composite metric from multiple metrics:

1 from opik_optimizer import MultiMetricObjective
2 
3 multi_metric_objective = MultiMetricObjective(
4     weights=[0.4, 0.6],
5     metrics=[metric_1, metric_2],
6 )

End to end example

In this guide, we’ll demonstrate multi-metric optimization with a simple question-answering task. The example optimizes a basic Q&A agent to balance both accuracy and relevance without requiring complex tool usage.

To use multi-metric optimization, you need to:

Define multiple metric functions
Create a MultiMetricObjective class instance using your functions and weights
Pass it to your optimizer as the metric to optimize for

Define Your Metrics

Create individual metric functions that evaluate different aspects of your agent’s output:

1 from typing import Any, Dict
2 from opik.evaluation.metrics import LevenshteinRatio, AnswerRelevance
3 from opik.evaluation.metrics.score_result import ScoreResult
4 
5 def levenshtein_ratio(dataset_item: Dict[str, Any], llm_output: str) -> ScoreResult:
6     """Measures string similarity between output and reference answer."""
7     metric = LevenshteinRatio()
8     return metric.score(reference=dataset_item["answer"], output=llm_output)
9 
10 def answer_relevance_score(dataset_item: Dict[str, Any], llm_output: str) -> ScoreResult:
11     """Evaluates how relevant the answer is to the question and context."""
12     metric = AnswerRelevance()
13     return metric.score(
14         context=[dataset_item["answer"]], 
15         output=llm_output, 
16         input=dataset_item["question"]
17     )

Create a Multi-Metric Objective

Combine your metrics with weights using MultiMetricObjective:

1 import opik_optimizer
2 
3 multi_metric_objective = opik_optimizer.MultiMetricObjective(
4     weights=[0.4, 0.6],
5     metrics=[levenshtein_ratio, answer_relevance_score],
6     name="my_composite_metric",
7 )

Understanding Weights:

The weights parameter controls the relative importance of each metric:

weights=[0.4, 0.6] → First metric contributes 40%, second contributes 60%
Higher weights emphasize those metrics during optimization
Weights don’t need to sum to 1—use any values that represent your priorities

Use with Optimizer

1 import opik
2 import opik_optimizer
3 from opik_optimizer import ChatPrompt
4 from opik_optimizer.gepa_optimizer import GepaOptimizer
5 
6 # Create a simple dataset
7 client = opik.Opik()
8 dataset = client.get_or_create_dataset(name='multi_metric_example')
9 dataset.insert([
10     {"question": "What is the capital of France?", "answer": "Paris"},
11     {"question": "What is 2+2?", "answer": "4"},
12 ])
13 
14 # Define a simple prompt
15 prompt = ChatPrompt(
16     messages=[
17         {"role": "system", "content": "You are a helpful assistant."},
18         {"role": "user", "content": "{question}"},
19     ],
20     model="gpt-4o-mini"
21 )
22 
23 optimizer = GepaOptimizer(
24     model="openai/gpt-4o-mini",
25     reflection_model="openai/gpt-4o",
26     project_name="Multi-Metric-Example",
27     temperature=0.7,
28     max_tokens=100,
29 )
30 
31 result = optimizer.optimize_prompt(
32     prompt=prompt,
33     dataset=dataset,
34     metric=multi_metric_objective,  # Use the composite metric
35     n_samples=6,
36 )
37 
38 result.display()

View Results

You can view the results of the optimization in the Opik dashboard:

What you’ll see:

Composite metric (my_composite_metric) — The weighted combination of all metrics
Individual metrics (levenshtein_ratio, answer_relevance_score) — Each component tracked separately
Trial progression — Metric evolution over time

This lets you see not just overall optimization progress, but how each metric contributes to the final score.

Next Steps

Explore available evaluation metrics
Understand optimization strategies