Multi-Metric Optimization

When optimizing AI agents, you often need to balance multiple quality dimensions simultaneously. Multi-metric optimization allows you to combine several evaluation metrics with customizable weights to create a composite objective function.

Why Use Multi-Metric Optimization?

While you can implement metric combinations within a custom metric function, using Opik Optimizer’s MultiMetricObjective API provides additional benefits:

  • Automatic logging of all component metrics to the Opik platform
  • Individual tracking of each sub-metric alongside the composite score
  • Detailed visibility into how each metric contributes to optimization
  • Trial-level insights for both aggregate and individual trace performance

This visibility helps you understand trade-offs between different quality dimensions during optimization.

Quickstart

You can use MultiMetricObjective to create a composite metric from multiple metrics:

1from opik_optimizer import MultiMetricObjective
2
3multi_metric_objective = MultiMetricObjective(
4 weights=[0.4, 0.6],
5 metrics=[metric_1, metric_2],
6)

End to end example

In this guide, we’ll demonstrate multi-metric optimization with a simple question-answering task. The example optimizes a basic Q&A agent to balance both accuracy and relevance without requiring complex tool usage.

To use multi-metric optimization, you need to:

  1. Define multiple metric functions
  2. Create a MultiMetricObjective class instance using your functions and weights
  3. Pass it to your optimizer as the metric to optimize for
1

Define Your Metrics

Create individual metric functions that evaluate different aspects of your agent’s output:

1from typing import Any, Dict
2from opik.evaluation.metrics import LevenshteinRatio, AnswerRelevance
3from opik.evaluation.metrics.score_result import ScoreResult
4
5def levenshtein_ratio(dataset_item: Dict[str, Any], llm_output: str) -> ScoreResult:
6 """Measures string similarity between output and reference answer."""
7 metric = LevenshteinRatio()
8 return metric.score(reference=dataset_item["answer"], output=llm_output)
9
10def answer_relevance_score(dataset_item: Dict[str, Any], llm_output: str) -> ScoreResult:
11 """Evaluates how relevant the answer is to the question and context."""
12 metric = AnswerRelevance()
13 return metric.score(
14 context=[dataset_item["answer"]],
15 output=llm_output,
16 input=dataset_item["question"]
17 )
2

Create a Multi-Metric Objective

Combine your metrics with weights using MultiMetricObjective:

1import opik_optimizer
2
3multi_metric_objective = opik_optimizer.MultiMetricObjective(
4 weights=[0.4, 0.6],
5 metrics=[levenshtein_ratio, answer_relevance_score],
6 name="my_composite_metric",
7)

Understanding Weights:

The weights parameter controls the relative importance of each metric:

  • weights=[0.4, 0.6] → First metric contributes 40%, second contributes 60%
  • Higher weights emphasize those metrics during optimization
  • Weights don’t need to sum to 1—use any values that represent your priorities
3

Use with Optimizer

1import opik
2import opik_optimizer
3from opik_optimizer import ChatPrompt
4from opik_optimizer.gepa_optimizer import GepaOptimizer
5
6# Create a simple dataset
7client = opik.Opik()
8dataset = client.get_or_create_dataset(name='multi_metric_example')
9dataset.insert([
10 {"question": "What is the capital of France?", "answer": "Paris"},
11 {"question": "What is 2+2?", "answer": "4"},
12])
13
14# Define a simple prompt
15prompt = ChatPrompt(
16 messages=[
17 {"role": "system", "content": "You are a helpful assistant."},
18 {"role": "user", "content": "{question}"},
19 ],
20 model="gpt-4o-mini"
21)
22
23optimizer = GepaOptimizer(
24 model="openai/gpt-4o-mini",
25 reflection_model="openai/gpt-4o",
26 project_name="Multi-Metric-Example",
27 temperature=0.7,
28 max_tokens=100,
29)
30
31result = optimizer.optimize_prompt(
32 prompt=prompt,
33 dataset=dataset,
34 metric=multi_metric_objective, # Use the composite metric
35 n_samples=6,
36)
37
38result.display()
4

View Results

You can view the results of the optimization in the Opik dashboard:

Multi-metric optimization results

What you’ll see:

  • Composite metric (my_composite_metric) — The weighted combination of all metrics
  • Individual metrics (levenshtein_ratio, answer_relevance_score) — Each component tracked separately
  • Trial progression — Metric evolution over time

This lets you see not just overall optimization progress, but how each metric contributes to the final score.

Next Steps