Opik Agent Optimizer API Reference

The Opik Agent Optimizer SDK provides a comprehensive set of tools for optimizing LLM prompts and agents. This reference guide documents the standardized API that all optimizers follow, ensuring consistency and interoperability across different optimization algorithms.

Key Features

Standardized API: All optimizers follow the same interface for optimize_prompt() methods
Multiple Algorithms: Support for various optimization strategies including evolutionary, few-shot, meta-prompt, and GEPA
MCP Support: Built-in support for Model Context Protocol tool calling
Consistent Results: All optimizers return standardized OptimizationResult objects
Counter Tracking: Built-in LLM and tool call counters for monitoring usage
Backward Compatibility: All original parameters preserved through kwargs extraction
Deprecation Warnings: Clear warnings for deprecated parameters with migration guidance

Core Classes

The SDK provides several optimizer classes that all inherit from BaseOptimizer and implement the same standardized interface:

ParameterOptimizer: Optimizes LLM call parameters (temperature, top_p, etc.) using Bayesian optimization
FewShotBayesianOptimizer: Uses few-shot learning with Bayesian optimization
MetaPromptOptimizer: Employs meta-prompting techniques for optimization
EvolutionaryOptimizer: Uses genetic algorithms for prompt evolution
GepaOptimizer: Leverages GEPA (Genetic-Pareto) optimization approach
HRPO (Hierarchical Reflective Prompt Optimizer): Uses hierarchical root cause analysis for targeted prompt refinement

Standardized Method Signatures

All optimizers implement these core methods with identical signatures:

optimize_prompt()

1 def optimize_prompt(
2     self,
3     prompt: ChatPrompt | dict[str, ChatPrompt],
4     dataset: Dataset,
5     metric: MetricFunction,
6     agent: OptimizableAgent | None = None,
7     experiment_config: dict | None = None,
8     n_samples: int | None = None,
9     auto_continue: bool = False,
10     project_name: str | None = None,
11     optimization_id: str | None = None,
12     validation_dataset: Dataset | None = None,
13     max_trials: int = 10,
14     allow_tool_use: bool = True,
15     **kwargs: Any,
16 ) -> OptimizationResult

Deprecation Warnings

The following parameters are deprecated and will be removed in future versions:

Constructor Parameters

num_threads in optimizer constructors: Use n_threads instead

Example Migration

1 # ❌ Deprecated
2 optimizer = FewShotBayesianOptimizer(
3     model="gpt-4o-mini",
4     num_threads=16,  # Deprecated
5 )
6 
7 # ✅ Correct
8 optimizer = FewShotBayesianOptimizer(
9     model="gpt-4o-mini",
10     n_threads=16,  # Use n_threads instead
11 )

FewShotBayesianOptimizer

1 FewShotBayesianOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     min_examples: int = 2,
5     max_examples: int = 8,
6     n_threads: int = 12,
7     verbose: int = 1,
8     seed: int = 42,
9     name: str | None = None,
10     enable_columnar_selection: bool = True,
11     enable_diversity: bool = True,
12     enable_multivariate_tpe: bool = True,
13     enable_optuna_pruning: bool = True,
14     prompt_overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None,
15     skip_perfect_score: bool = True,
16     perfect_score: float = 0.95
17 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for optimizer’s internal reasoning (generating few-shot templates)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

min_examples

intDefaults to 2

Minimum number of examples to include in the prompt

max_examples

intDefaults to 8

Maximum number of examples to include in the prompt

n_threads

intDefaults to 12

Number of threads for parallel evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

name

str | None

enable_columnar_selection

boolDefaults to True

Toggle column-aware example grouping (categorical Optuna params)

enable_diversity

boolDefaults to True

enable_multivariate_tpe

boolDefaults to True

Enable Optuna’s multivariate TPE sampler (default: True)

enable_optuna_pruning

boolDefaults to True

Enable Optuna pruner for early stopping (default: True)

prompt_overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

Optional dict or callable to override/customize prompt templates. If a dict, keys should match DEFAULT_PROMPTS keys. If a callable, receives the PromptLibrary instance for in-place modification.

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool = False,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

boolDefaults to False

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_optimizer_metadata

1 get_optimizer_metadata()

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_prompt

1 optimize_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     experiment_config: dict | None = None,
7     n_samples: int | float | str | None = None,
8     n_samples_minibatch: int | None = None,
9     n_samples_strategy: str | None = None,
10     auto_continue: bool = False,
11     project_name: str | None = None,
12     optimization_id: str | None = None,
13     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
14     max_trials: int = 10,
15     allow_tool_use: bool = True,
16     optimize_prompt: bool | str | list[str] | None = 'system',
17     args: Any,
18     kwargs: Any
19 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt to optimize (single ChatPrompt or dict of prompts)

dataset

Dataset

Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.

metric

MetricFunction

A metric function with signature (dataset_item, llm_output) -> float

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional agent for prompt execution (defaults to LiteLLMAgent)

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | float | str | None

Number of samples to use for evaluation

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

auto_continue

boolDefaults to False

Whether to continue optimization automatically

project_name

str | None

Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset for ranking candidates

max_trials

intDefaults to 10

Maximum number of optimization trials

allow_tool_use

boolDefaults to True

Whether tools may be executed during evaluation (default True)

optimize_prompt

bool | str | list[str] | NoneDefaults to system

Which prompt roles to allow for optimization

args

Any

kwargs

Any

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

run_optimization

1 run_optimization(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context with prompts, dataset, metric, etc.

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

GepaOptimizer

1 GepaOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     n_threads: int = 12,
5     verbose: int = 1,
6     seed: int = 42,
7     name: str | None = None,
8     skip_perfect_score: bool = True,
9     perfect_score: float = 0.95,
10     prompt_overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None
11 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for the optimization algorithm

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

n_threads

intDefaults to 12

Number of parallel threads for evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

name

str | None

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

prompt_overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

Accepted for API parity, but ignored (GEPA does not expose prompt hooks).

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool | None = None,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

bool | None

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_optimizer_metadata

1 get_optimizer_metadata()

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_prompt

1 optimize_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     experiment_config: dict | None = None,
7     n_samples: int | float | str | None = None,
8     n_samples_minibatch: int | None = None,
9     n_samples_strategy: str | None = None,
10     auto_continue: bool = False,
11     project_name: str | None = None,
12     optimization_id: str | None = None,
13     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
14     max_trials: int = 10,
15     allow_tool_use: bool = True,
16     optimize_prompt: bool | str | list[str] | None = 'system',
17     args: Any,
18     kwargs: Any
19 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt to optimize (single ChatPrompt or dict of prompts)

dataset

Dataset

Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.

metric

MetricFunction

A metric function with signature (dataset_item, llm_output) -> float

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional agent for prompt execution (defaults to LiteLLMAgent)

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | float | str | None

Number of samples to use for evaluation

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

auto_continue

boolDefaults to False

Whether to continue optimization automatically

project_name

str | None

Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset for ranking candidates

max_trials

intDefaults to 10

Maximum number of optimization trials

allow_tool_use

boolDefaults to True

Whether tools may be executed during evaluation (default True)

optimize_prompt

bool | str | list[str] | NoneDefaults to system

Which prompt roles to allow for optimization

args

Any

kwargs

Any

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

run_optimization

1 run_optimization(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context with prompts, dataset, metric, etc.

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

MetaPromptOptimizer

1 MetaPromptOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     prompts_per_round: int = 4,
5     enable_context: bool = True,
6     num_task_examples: int = 5,
7     task_context_columns: list[str] | None = None,
8     n_threads: int = 12,
9     verbose: int = 1,
10     seed: int = 42,
11     name: str | None = None,
12     use_hall_of_fame: bool = True,
13     prompt_overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None,
14     skip_perfect_score: bool = True,
15     perfect_score: float = 0.95
16 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for optimizer’s internal reasoning/generation calls

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

prompts_per_round

intDefaults to 4

Number of candidate prompts to generate per optimization round

enable_context

boolDefaults to True

Whether to include task-specific context learning when reasoning

num_task_examples

intDefaults to 5

Number of dataset examples to show in task context (default: 10)

task_context_columns

list[str] | None

Specific dataset columns to include in context (None = all input columns)

n_threads

intDefaults to 12

Number of parallel threads for prompt evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

name

str | None

use_hall_of_fame

boolDefaults to True

Enable Hall of Fame pattern extraction and re-injection

prompt_overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

Optional dict or callable to customize internal prompts.

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool | None = None,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

bool | None

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_optimizer_metadata

1 get_optimizer_metadata()

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_prompt

1 optimize_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     experiment_config: dict | None = None,
7     n_samples: int | float | str | None = None,
8     n_samples_minibatch: int | None = None,
9     n_samples_strategy: str | None = None,
10     auto_continue: bool = False,
11     project_name: str | None = None,
12     optimization_id: str | None = None,
13     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
14     max_trials: int = 10,
15     allow_tool_use: bool = True,
16     optimize_prompt: bool | str | list[str] | None = 'system',
17     args: Any,
18     kwargs: Any
19 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt to optimize (single ChatPrompt or dict of prompts)

dataset

Dataset

Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.

metric

MetricFunction

A metric function with signature (dataset_item, llm_output) -> float

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional agent for prompt execution (defaults to LiteLLMAgent)

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | float | str | None

Number of samples to use for evaluation

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

auto_continue

boolDefaults to False

Whether to continue optimization automatically

project_name

str | None

Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset for ranking candidates

max_trials

intDefaults to 10

Maximum number of optimization trials

allow_tool_use

boolDefaults to True

Whether tools may be executed during evaluation (default True)

optimize_prompt

bool | str | list[str] | NoneDefaults to system

Which prompt roles to allow for optimization

args

Any

kwargs

Any

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

run_optimization

1 run_optimization(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context with prompts, dataset, metric, etc.

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

EvolutionaryOptimizer

1 EvolutionaryOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     population_size: int = 30,
5     num_generations: int = 15,
6     mutation_rate: float = 0.2,
7     crossover_rate: float = 0.8,
8     tournament_size: int = 4,
9     elitism_size: int = 3,
10     adaptive_mutation: bool = True,
11     enable_moo: bool = True,
12     enable_llm_crossover: bool = True,
13     enable_semantic_crossover: bool = False,
14     output_style_guidance: str | None = None,
15     infer_output_style: bool = False,
16     n_threads: int = 12,
17     verbose: int = 1,
18     seed: int = 42,
19     name: str | None = None,
20     prompt_overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None,
21     skip_perfect_score: bool = True,
22     perfect_score: float = 0.95
23 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for optimizer’s internal operations (mutations, crossover, etc.)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

population_size

intDefaults to 30

Number of prompts in the population

num_generations

intDefaults to 15

Number of generations to run

mutation_rate

floatDefaults to 0.2

Mutation rate for genetic operations

crossover_rate

floatDefaults to 0.8

Crossover rate for genetic operations

tournament_size

intDefaults to 4

Tournament size for selection

elitism_size

intDefaults to 3

Number of elite prompts to preserve across generations

adaptive_mutation

boolDefaults to True

Whether to use adaptive mutation that adjusts based on population diversity

enable_moo

boolDefaults to True

Whether to enable multi-objective optimization (optimizes metric and prompt length)

enable_llm_crossover

boolDefaults to True

Whether to enable LLM-based crossover operations

enable_semantic_crossover

boolDefaults to False

Whether to use semantic crossover before standard LLM crossover

output_style_guidance

str | None

Optional guidance for output style in generated prompts

infer_output_style

boolDefaults to False

Whether to automatically infer output style from the dataset

n_threads

intDefaults to 12

Number of threads for parallel evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

name

str | None

prompt_overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

Optional dict or callable to customize internal prompts.

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool | None = None,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

bool | None

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_optimizer_metadata

1 get_optimizer_metadata()

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_prompt

1 optimize_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     experiment_config: dict | None = None,
7     n_samples: int | float | str | None = None,
8     n_samples_minibatch: int | None = None,
9     n_samples_strategy: str | None = None,
10     auto_continue: bool = False,
11     project_name: str | None = None,
12     optimization_id: str | None = None,
13     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
14     max_trials: int = 10,
15     allow_tool_use: bool = True,
16     optimize_prompt: bool | str | list[str] | None = 'system',
17     args: Any,
18     kwargs: Any
19 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt to optimize (single ChatPrompt or dict of prompts)

dataset

Dataset

Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.

metric

MetricFunction

A metric function with signature (dataset_item, llm_output) -> float

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional agent for prompt execution (defaults to LiteLLMAgent)

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | float | str | None

Number of samples to use for evaluation

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

auto_continue

boolDefaults to False

Whether to continue optimization automatically

project_name

str | None

Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset for ranking candidates

max_trials

intDefaults to 10

Maximum number of optimization trials

allow_tool_use

boolDefaults to True

Whether tools may be executed during evaluation (default True)

optimize_prompt

bool | str | list[str] | NoneDefaults to system

Which prompt roles to allow for optimization

args

Any

kwargs

Any

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

run_optimization

1 run_optimization(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context with prompts, dataset, metric, etc.

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

HierarchicalReflectiveOptimizer

1 HierarchicalReflectiveOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     reasoning_model: str | None = None,
5     reasoning_model_parameters: dict[str, typing.Any] | None = None,
6     max_parallel_batches: int = 5,
7     batch_size: int = 25,
8     convergence_threshold: float = 0.01,
9     n_threads: int = 12,
10     verbose: int = 1,
11     seed: int = 42,
12     name: str | None = None,
13     prompt_overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None,
14     skip_perfect_score: bool = True,
15     perfect_score: float = 0.95
16 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for the optimization algorithm (reasoning and analysis)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

reasoning_model

str | None

reasoning_model_parameters

dict[str, typing.Any] | None

max_parallel_batches

intDefaults to 5

Maximum number of batches to process concurrently during hierarchical root cause analysis

batch_size

intDefaults to 25

Number of test cases per batch for root cause analysis

convergence_threshold

floatDefaults to 0.01

Stop if relative improvement is below this threshold

n_threads

intDefaults to 12

Number of parallel threads for evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

name

str | None

prompt_overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

Optional dict or callable to override/customize prompt templates. If a dict, keys should match DEFAULT_PROMPTS keys. If a callable, receives the PromptLibrary instance for in-place modification.

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool | None = None,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

bool | None

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_optimizer_metadata

1 get_optimizer_metadata()

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_prompt

1 optimize_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     experiment_config: dict | None = None,
7     n_samples: int | float | str | None = None,
8     n_samples_minibatch: int | None = None,
9     n_samples_strategy: str | None = None,
10     auto_continue: bool = False,
11     project_name: str | None = None,
12     optimization_id: str | None = None,
13     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
14     max_trials: int = 10,
15     allow_tool_use: bool = True,
16     optimize_prompt: bool | str | list[str] | None = 'system',
17     args: Any,
18     kwargs: Any
19 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt to optimize (single ChatPrompt or dict of prompts)

dataset

Dataset

Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.

metric

MetricFunction

A metric function with signature (dataset_item, llm_output) -> float

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional agent for prompt execution (defaults to LiteLLMAgent)

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | float | str | None

Number of samples to use for evaluation

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

auto_continue

boolDefaults to False

Whether to continue optimization automatically

project_name

str | None

Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset for ranking candidates

max_trials

intDefaults to 10

Maximum number of optimization trials

allow_tool_use

boolDefaults to True

Whether tools may be executed during evaluation (default True)

optimize_prompt

bool | str | list[str] | NoneDefaults to system

Which prompt roles to allow for optimization

args

Any

kwargs

Any

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

run_optimization

1 run_optimization(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context with prompts, dataset, metric, etc.

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

ParameterOptimizer

1 ParameterOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     default_n_trials: int = 20,
5     local_search_ratio: float = 0.3,
6     local_search_scale: float = 0.2,
7     n_threads: int = 12,
8     verbose: int = 1,
9     seed: int = 42,
10     name: str | None = None,
11     skip_perfect_score: bool = True,
12     perfect_score: float = 0.95
13 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name (used for metadata, not for optimization calls)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

default_n_trials

intDefaults to 20

Default number of optimization trials to run

local_search_ratio

floatDefaults to 0.3

Ratio of trials to dedicate to local search refinement (0.0-1.0)

local_search_scale

floatDefaults to 0.2

Scale factor for narrowing search space during local search

n_threads

intDefaults to 12

Number of parallel threads for evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

name

str | None

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool | None = None,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

bool | None

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_optimizer_metadata

1 get_optimizer_metadata()

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_parameter

1 optimize_parameter(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     parameter_space: opik_optimizer.algorithms.parameter_optimizer.ops.search_ops.ParameterSearchSpace | collections.abc.Mapping[str, typing.Any],
6     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
7     experiment_config: dict | None = None,
8     max_trials: int | None = None,
9     n_samples: int | float | str | None = None,
10     n_samples_minibatch: int | None = None,
11     n_samples_strategy: str | None = None,
12     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
13     project_name: str = 'Optimization',
14     sampler: optuna.samplers._base.BaseSampler | None = None,
15     callbacks: list[collections.abc.Callable[[optuna.study.study.Study, optuna.trial._frozen.FrozenTrial], None]] | None = None,
16     timeout: float | None = None,
17     local_trials: int | None = None,
18     local_search_scale: float | None = None,
19     optimization_id: str | None = None
20 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt or dict of prompts to evaluate with tuned parameters. When a dict is provided, parameters are optimized independently for each prompt.

dataset

Dataset

Dataset providing evaluation examples

metric

MetricFunction

Objective function to maximize

parameter_space

opik_optimizer.algorithms.parameter_optimizer.ops.search_ops.ParameterSearchSpace | collections.abc.Mapping[str, typing.Any]

Definition of the search space for tunable parameters. For multi-prompt, params without a prefix are expanded per prompt. Params already prefixed (e.g., ‘analyze.temperature’) are kept as-is.

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset. Note: Due to the internal implementation of ParameterOptimizer, this parameter is currently not fully utilized and we recommend not using it for this optimizer.

experiment_config

dict | None

Optional experiment metadata

max_trials

int | None

Total number of trials (if None, uses default_n_trials)

n_samples

int | float | str | None

Number of dataset samples to evaluate per trial (None for all)

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional custom agent instance to execute evaluations

project_name

strDefaults to Optimization

Opik project name for logging traces (default: “Optimization”)

sampler

optuna.samplers._base.BaseSampler | None

Optuna sampler to use (default: TPESampler with seed)

callbacks

list[collections.abc.Callable[[optuna.study.study.Study, optuna.trial._frozen.FrozenTrial], None]] | None

List of callback functions for Optuna study

timeout

float | None

Maximum time in seconds for optimization

local_trials

int | None

Number of trials for local search (overrides local_search_ratio)

local_search_scale

float | None

Scale factor for local search narrowing (0.0-1.0)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run; when provided it must be a valid UUIDv7 string.

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

ParameterSearchSpace

1 ParameterSearchSpace(
2     parameters: list[opik_optimizer.algorithms.parameter_optimizer.ops.search_ops.ParameterSpec] = PydanticUndefined
3 )

Parameters:

parameters

list[opik_optimizer.algorithms.parameter_optimizer.ops.search_ops.ParameterSpec]Defaults to PydanticUndefined

ParameterSpec

1 ParameterSpec(
2     name: <class 'str'>,
3     description: str | None = None,
4     distribution: <enum 'ParameterType'>,
5     low: float | None = None,
6     high: float | None = None,
7     step: float | None = None,
8     scale: Literal['linear', 'log'] = 'linear',
9     choices: list[Any] | None = None,
10     target: str | collections.abc.Sequence[str] | None = None,
11     default: Any | None = None
12 )

Parameters:

name

<class 'str'>Defaults to PydanticUndefined

description

str | None

distribution

<enum 'ParameterType'>Defaults to PydanticUndefined

low

float | None

high

float | None

step

float | None

scale

Literal['linear', 'log']Defaults to linear

choices

list[Any] | None

target

str | collections.abc.Sequence[str] | None

default

Any | None

ParameterType

1 ParameterType(
2     args: Any,
3     kwds: Any
4 )

Parameters:

args

Any

kwds

Any

BaseOptimizer

1 BaseOptimizer(
2     model: str,
3     verbose: int = 1,
4     seed: int = 42,
5     model_parameters: dict[str, typing.Any] | None = None,
6     reasoning_model: str | None = None,
7     reasoning_model_parameters: dict[str, typing.Any] | None = None,
8     name: str | None = None,
9     skip_perfect_score: bool = True,
10     perfect_score: float = 0.95,
11     prompt_overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None,
12     display: opik_optimizer.utils.display.run.RunDisplay | None = None
13 )

Parameters:

model

str

verbose

intDefaults to 1

seed

intDefaults to 42

model_parameters

dict[str, typing.Any] | None

reasoning_model

str | None

reasoning_model_parameters

dict[str, typing.Any] | None

name

str | None

skip_perfect_score

boolDefaults to True

perfect_score

floatDefaults to 0.95

prompt_overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

display

opik_optimizer.utils.display.run.RunDisplay | None

Methods

begin_round

1 begin_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

cleanup

1 cleanup()

evaluate

1 evaluate(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     sampling_tag: str | None = None
6 )

Parameters:

context

OptimizationContext

Optimization context for this run.

prompts

dict

Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.

experiment_config

dict[str, typing.Any] | None

Optional experiment configuration.

sampling_tag

str | None

Optional sampling tag for deterministic subsampling per candidate.

evaluate_prompt

1 evaluate_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     n_threads: int | None = None,
7     verbose: int = 1,
8     dataset_item_ids: list[str] | None = None,
9     experiment_config: dict | None = None,
10     n_samples: int | float | str | None = None,
11     n_samples_strategy: str | None = None,
12     seed: int | None = None,
13     return_evaluation_result: bool = False,
14     allow_tool_use: bool | None = None,
15     use_evaluate_on_dict_items: bool | None = None,
16     sampling_tag: str | None = None
17 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

dataset

Dataset

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

n_threads

int | None

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | float | str | None

n_samples_strategy

str | None

seed

int | None

return_evaluation_result

boolDefaults to False

allow_tool_use

bool | None

use_evaluate_on_dict_items

bool | None

sampling_tag

str | None

evaluate_with_result

1 evaluate_with_result(
2     context: OptimizationContext,
3     prompts: dict,
4     experiment_config: dict[str, typing.Any] | None = None,
5     empty_score: float | None = None,
6     n_samples: int | float | str | None = None,
7     n_samples_strategy: str | None = None,
8     sampling_tag: str | None = None
9 )

Parameters:

context

OptimizationContext

prompts

dict

experiment_config

dict[str, typing.Any] | None

empty_score

float | None

n_samples

int | float | str | None

n_samples_strategy

str | None

sampling_tag

str | None

finish_candidate

1 finish_candidate(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

finish_round

1 finish_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

get_config

1 get_config(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_default_prompt

1 get_default_prompt(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

get_history_entries

1 get_history_entries()

get_history_rounds

1 get_history_rounds()

get_metadata

1 get_metadata(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

get_prompt

1 get_prompt(
2     key: str,
3     fmt: Any
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

Any

list_prompts

1 list_prompts()

on_trial

1 on_trial(
2     context: OptimizationContext,
3     prompts: dict,
4     score: float,
5     prev_best_score: float | None = None
6 )

Parameters:

context

OptimizationContext

prompts

dict

score

float

prev_best_score

float | None

optimize_prompt

1 optimize_prompt(
2     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
3     dataset: Dataset,
4     metric: MetricFunction,
5     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None = None,
6     experiment_config: dict | None = None,
7     n_samples: int | float | str | None = None,
8     n_samples_minibatch: int | None = None,
9     n_samples_strategy: str | None = None,
10     auto_continue: bool = False,
11     project_name: str | None = None,
12     optimization_id: str | None = None,
13     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None = None,
14     max_trials: int = 10,
15     allow_tool_use: bool = True,
16     optimize_prompt: bool | str | list[str] | None = 'system',
17     args: Any,
18     kwargs: Any
19 )

Parameters:

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]

The prompt to optimize (single ChatPrompt or dict of prompts)

dataset

Dataset

Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.

metric

MetricFunction

A metric function with signature (dataset_item, llm_output) -> float

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

Optional agent for prompt execution (defaults to LiteLLMAgent)

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | float | str | None

Number of samples to use for evaluation

n_samples_minibatch

int | None

Optional number of samples for inner-loop minibatches

n_samples_strategy

str | None

Sampling strategy name (default “random_sorted”)

auto_continue

boolDefaults to False

Whether to continue optimization automatically

project_name

str | None

Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)

optimization_id

str | None

Optional ID to use when creating the Opik optimization run

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

Optional validation dataset for ranking candidates

max_trials

intDefaults to 10

Maximum number of optimization trials

allow_tool_use

boolDefaults to True

Whether tools may be executed during evaluation (default True)

optimize_prompt

bool | str | list[str] | NoneDefaults to system

Which prompt roles to allow for optimization

args

Any

kwargs

Any

post_baseline

1 post_baseline(
2     context: OptimizationContext,
3     score: float
4 )

Parameters:

context

OptimizationContext

score

float

post_optimize

1 post_optimize(
2     context: OptimizationContext,
3     result: OptimizationResult
4 )

Parameters:

context

OptimizationContext

result

OptimizationResult

post_round

1 post_round(
2     round_handle: Any,
3     context: opik_optimizer.core.state.OptimizationContext | None = None,
4     best_score: float | None = None,
5     best_candidate: typing.Any | None = None,
6     best_prompt: typing.Any | None = None,
7     stop_reason: str | None = None,
8     extras: dict[str, typing.Any] | None = None,
9     candidates: list[dict[str, typing.Any]] | None = None,
10     timestamp: str | None = None,
11     dataset_split: str | None = None,
12     pareto_front: list[dict[str, typing.Any]] | None = None,
13     selection_meta: dict[str, typing.Any] | None = None
14 )

Parameters:

round_handle

Any

context

opik_optimizer.core.state.OptimizationContext | None

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

dataset_split

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

post_trial

1 post_trial(
2     context: OptimizationContext,
3     candidate_handle: Any,
4     score: float | None,
5     metrics: dict[str, typing.Any] | None = None,
6     extras: dict[str, typing.Any] | None = None,
7     candidates: list[dict[str, typing.Any]] | None = None,
8     dataset: str | None = None,
9     dataset_split: str | None = None,
10     trial_index: int | None = None,
11     timestamp: str | None = None,
12     round_handle: typing.Any | None = None
13 )

Parameters:

context

OptimizationContext

candidate_handle

Any

score

float | None

metrics

dict[str, typing.Any] | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

dataset

str | None

dataset_split

str | None

trial_index

int | None

timestamp

str | None

round_handle

typing.Any | None

pre_baseline

1 pre_baseline(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

pre_optimize

1 pre_optimize(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context

pre_round

1 pre_round(
2     context: OptimizationContext,
3     extras: Any
4 )

Parameters:

context

OptimizationContext

extras

Any

pre_trial

1 pre_trial(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

record_candidate_entry

1 record_candidate_entry(
2     prompt_or_payload: Any,
3     score: float | None = None,
4     id: str | None = None,
5     metrics: dict[str, typing.Any] | None = None,
6     notes: str | None = None,
7     extra: dict[str, typing.Any] | None = None,
8     context: opik_optimizer.core.state.OptimizationContext | None = None
9 )

Parameters:

prompt_or_payload

Any

score

float | None

str | None

metrics

dict[str, typing.Any] | None

notes

str | None

extra

dict[str, typing.Any] | None

context

opik_optimizer.core.state.OptimizationContext | None

run_optimization

1 run_optimization(
2     context: OptimizationContext
3 )

Parameters:

context

OptimizationContext

The optimization context with prompts, dataset, metric, etc.

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_candidate

1 start_candidate(
2     context: OptimizationContext,
3     candidate: Any,
4     round_handle: typing.Any | None = None
5 )

Parameters:

context

OptimizationContext

candidate

Any

round_handle

typing.Any | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

ChatPrompt

1 ChatPrompt(
2     name: str = 'chat-prompt',
3     system: str | None = None,
4     user: str | None = None,
5     messages: list[dict[str, typing.Any]] | None = None,
6     tools: list[dict[str, typing.Any]] | None = None,
7     function_map: dict[str, collections.abc.Callable] | None = None,
8     model: str = 'gpt-4o-mini',
9     model_parameters: dict[str, typing.Any] | None = None,
10     model_kwargs: dict[str, typing.Any] | None = None,
11     kwargs: Any
12 )

Parameters:

name

strDefaults to chat-prompt

system

str | None

the system prompt

user

str | None

messages

list[dict[str, typing.Any]] | None

a list of dictionaries with role/content, with a content containing {input-dataset-field}

tools

list[dict[str, typing.Any]] | None

function_map

dict[str, collections.abc.Callable] | None

model

strDefaults to gpt-4o-mini

model_parameters

dict[str, typing.Any] | None

model_kwargs

dict[str, typing.Any] | None

kwargs

Any

Methods

copy

1 copy()

get_messages

1 get_messages(
2     dataset_item: dict[str, typing.Any] | None = None
3 )

Parameters:

dataset_item

dict[str, typing.Any] | None

replace_in_messages

1 replace_in_messages(
2     messages: list,
3     label: str,
4     value: str
5 )

Parameters:

messages

list

label

str

value

str

set_messages

1 set_messages(
2     messages: list
3 )

Parameters:

messages

list

to_dict

1 to_dict()

AlgorithmResult

1 AlgorithmResult(
2     best_prompts: dict,
3     best_score: float,
4     history: Sequence = <factory>,
5     metadata: dict = <factory>
6 )

Parameters:

best_prompts

dict

best_score

float

history

SequenceDefaults to <factory>

metadata

dictDefaults to <factory>

OptimizationResult

1 OptimizationResult(
2     schema_version: <class 'str'> = 'v1',
3     details_version: <class 'str'> = 'v1',
4     optimizer: <class 'str'> = 'Optimizer',
5     prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt],
6     score: <class 'float'>,
7     metric_name: <class 'str'>,
8     optimization_id: str | None = None,
9     dataset_id: str | None = None,
10     initial_prompt: opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt] | None = None,
11     initial_score: float | None = None,
12     details: dict[str, Any] = PydanticUndefined,
13     history: list[dict[str, Any]] = [],
14     llm_calls: int | None = None,
15     llm_calls_tools: int | None = None,
16     llm_cost_total: float | None = None,
17     llm_token_usage_total: dict[str, int] | None = None
18 )

Parameters:

schema_version

<class 'str'>Defaults to v1

details_version

<class 'str'>Defaults to v1

optimizer

<class 'str'>Defaults to Optimizer

prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt]Defaults to PydanticUndefined

score

<class 'float'>Defaults to PydanticUndefined

metric_name

<class 'str'>Defaults to PydanticUndefined

optimization_id

str | None

dataset_id

str | None

initial_prompt

opik_optimizer.api_objects.chat_prompt.ChatPrompt | dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt] | None

initial_score

float | None

details

dict[str, Any]Defaults to PydanticUndefined

history

list[dict[str, Any]]Defaults to []

llm_calls

int | None

llm_calls_tools

int | None

llm_cost_total

float | None

llm_token_usage_total

dict[str, int] | None

OptimizationContext

1 OptimizationContext(
2     prompts: dict,
3     initial_prompts: dict,
4     is_single_prompt_optimization: bool,
5     dataset: Dataset,
6     evaluation_dataset: Dataset,
7     validation_dataset: opik.api_objects.dataset.dataset.Dataset | None,
8     metric: MetricFunction,
9     agent: opik_optimizer.agents.optimizable_agent.OptimizableAgent | None,
10     optimization: opik.api_objects.optimization.optimization.Optimization | None,
11     optimization_id: str | None,
12     experiment_config: dict[str, typing.Any] | None,
13     n_samples: int | float | str | None,
14     max_trials: int,
15     project_name: str,
16     n_samples_minibatch: int | None = None,
17     n_samples_strategy: str = 'random_sorted',
18     allow_tool_use: bool = True,
19     baseline_score: float | None = None,
20     extra_params: dict = <factory>,
21     trials_completed: int = 0,
22     should_stop: bool = False,
23     finish_reason: Optional = None,
24     current_best_score: float | None = None,
25     current_best_prompt: dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt] | None = None,
26     dataset_split: str | None = None
27 )

Parameters:

prompts

dict

initial_prompts

dict

is_single_prompt_optimization

bool

dataset

Dataset

evaluation_dataset

Dataset

validation_dataset

opik.api_objects.dataset.dataset.Dataset | None

metric

MetricFunction

agent

opik_optimizer.agents.optimizable_agent.OptimizableAgent | None

optimization

opik.api_objects.optimization.optimization.Optimization | None

optimization_id

str | None

experiment_config

dict[str, typing.Any] | None

n_samples

int | float | str | None

max_trials

int

project_name

str

n_samples_minibatch

int | None

n_samples_strategy

strDefaults to random_sorted

allow_tool_use

boolDefaults to True

baseline_score

float | None

extra_params

dictDefaults to <factory>

trials_completed

intDefaults to 0

should_stop

boolDefaults to False

finish_reason

Optional

current_best_score

float | None

current_best_prompt

dict[str, opik_optimizer.api_objects.chat_prompt.ChatPrompt] | None

dataset_split

str | None

OptimizationHistoryState

1 OptimizationHistoryState(
2     context: Any = None
3 )

Parameters:

context

Any

Methods

clear

1 clear()

end_round

1 end_round(
2     round_handle: Any,
3     best_score: float | None = None,
4     best_candidate: typing.Any | None = None,
5     best_prompt: typing.Any | None = None,
6     stop_reason: str | None = None,
7     extras: dict[str, typing.Any] | None = None,
8     candidates: list[dict[str, typing.Any]] | None = None,
9     timestamp: str | None = None,
10     pareto_front: list[dict[str, typing.Any]] | None = None,
11     selection_meta: dict[str, typing.Any] | None = None,
12     dataset_split: str | None = None
13 )

Parameters:

round_handle

Any

best_score

float | None

best_candidate

typing.Any | None

best_prompt

typing.Any | None

stop_reason

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

pareto_front

list[dict[str, typing.Any]] | None

selection_meta

dict[str, typing.Any] | None

dataset_split

str | None

finalize_stop

1 finalize_stop(
2     stop_reason: str | None = None
3 )

Parameters:

stop_reason

str | None

get_entries

1 get_entries()

get_rounds

1 get_rounds()

record_trial

1 record_trial(
2     round_handle: Any,
3     score: float | None,
4     candidate: typing.Any | None = None,
5     trial_index: int | None = None,
6     metrics: dict[str, typing.Any] | None = None,
7     dataset: str | None = None,
8     dataset_split: str | None = None,
9     extras: dict[str, typing.Any] | None = None,
10     candidates: list[dict[str, typing.Any]] | None = None,
11     timestamp: str | None = None,
12     stop_reason: str | None = None,
13     candidate_id_prefix: str | None = None
14 )

Parameters:

round_handle

Any

score

float | None

candidate

typing.Any | None

trial_index

int | None

metrics

dict[str, typing.Any] | None

dataset

str | None

dataset_split

str | None

extras

dict[str, typing.Any] | None

candidates

list[dict[str, typing.Any]] | None

timestamp

str | None

stop_reason

str | None

candidate_id_prefix

str | None

set_context

1 set_context(
2     context: Any
3 )

Parameters:

context

Any

set_default_dataset_split

1 set_default_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

set_pareto_front

1 set_pareto_front(
2     pareto_front: list[dict[str, typing.Any]] | None
3 )

Parameters:

pareto_front

list[dict[str, typing.Any]] | None

set_selection_meta

1 set_selection_meta(
2     selection_meta: dict[str, typing.Any] | None
3 )

Parameters:

selection_meta

dict[str, typing.Any] | None

start_round

1 start_round(
2     round_index: int | None = None,
3     extras: dict[str, typing.Any] | None = None,
4     timestamp: str | None = None
5 )

Parameters:

round_index

int | None

extras

dict[str, typing.Any] | None

timestamp

str | None

with_dataset_split

1 with_dataset_split(
2     dataset_split: str | None
3 )

Parameters:

dataset_split

str | None

OptimizationRound

1 OptimizationRound(
2     round_index: int,
3     trials: list = <factory>,
4     best_score: float | None = None,
5     best_so_far: float | None = None,
6     best_prompt: typing.Any | None = None,
7     best_candidate: typing.Any | None = None,
8     candidates: list[dict[str, typing.Any]] | None = None,
9     generated_prompts: list[dict[str, typing.Any]] | None = None,
10     stop_reason: str | None = None,
11     stopped: bool | None = None,
12     dataset_split: str | None = None,
13     extras: dict[str, typing.Any] | None = None,
14     timestamp: str = <factory>
15 )

Parameters:

round_index

int

trials

listDefaults to <factory>

best_score

float | None

best_so_far

float | None

best_prompt

typing.Any | None

best_candidate

typing.Any | None

candidates

list[dict[str, typing.Any]] | None

generated_prompts

list[dict[str, typing.Any]] | None

stop_reason

str | None

stopped

bool | None

dataset_split

str | None

extras

dict[str, typing.Any] | None

timestamp

strDefaults to <factory>

Methods

to_dict

1 to_dict()

OptimizationTrial

1 OptimizationTrial(
2     trial_index: int | None,
3     score: float | None,
4     candidate: Any,
5     metrics: dict[str, typing.Any] | None = None,
6     dataset: str | None = None,
7     dataset_split: str | None = None,
8     candidate_id: str | None = None,
9     extras: dict[str, typing.Any] | None = None,
10     timestamp: str = <factory>
11 )

Parameters:

trial_index

int | None

score

float | None

candidate

Any

metrics

dict[str, typing.Any] | None

dataset

str | None

dataset_split

str | None

candidate_id

str | None

extras

dict[str, typing.Any] | None

timestamp

strDefaults to <factory>

Methods

to_dict

1 to_dict()

OptimizableAgent

1 OptimizableAgent(
2     prompt: Any = None,
3     project_name: Any = None,
4     kwargs: Any
5 )

Parameters:

prompt

Any

project_name

Any

kwargs

Any

Methods

init_agent

1 init_agent(
2     prompt: Any
3 )

Parameters:

prompt

Any

init_llm

1 init_llm()

invoke

1 invoke(
2     messages: list,
3     seed: int | None = None
4 )

Parameters:

messages

list

List of message dictionaries

seed

int | None

Optional seed for reproducibility

invoke_agent

1 invoke_agent(
2     prompts: Any,
3     dataset_item: Any,
4     allow_tool_use: Any = False,
5     seed: Any = None
6 )

Parameters:

prompts

Any

dataset_item

Any

allow_tool_use

AnyDefaults to False

seed

Any

invoke_agent_candidates

1 invoke_agent_candidates(
2     prompts: Any,
3     dataset_item: Any,
4     allow_tool_use: Any = False,
5     seed: Any = None
6 )

Parameters:

prompts

Any

Mapping of prompt name to ChatPrompt.

dataset_item

Any

Dataset row used to render the prompt messages.

allow_tool_use

AnyDefaults to False

Whether tool execution is allowed in this invocation.

seed

Any

Optional seed for reproducibility.

invoke_dataset_item

1 invoke_dataset_item(
2     dataset_item: dict
3 )

Parameters:

dataset_item

dict

invoke_prompt

1 invoke_prompt(
2     prompt: Any,
3     dataset_item: Any,
4     allow_tool_use: Any = False,
5     seed: Any = None
6 )

Parameters:

prompt

Any

dataset_item

Any

allow_tool_use

AnyDefaults to False

seed

Any

llm_invoke

1 llm_invoke(
2     query: str | None = None,
3     messages: list[dict[str, str]] | None = None,
4     seed: int | None = None,
5     allow_tool_use: bool | None = False
6 )

Parameters:

query

str | None

messages

list[dict[str, str]] | None

seed

int | None

allow_tool_use

bool | NoneDefaults to False

MultiMetricObjective

1 MultiMetricObjective(
2     metrics: list,
3     weights: list[float] | None = None,
4     name: str = 'multi_metric_objective',
5     reason: str | None = None,
6     reason_builder: collections.abc.Callable[[list[opik.evaluation.metrics.score_result.ScoreResult], list[float], float], str | None] | None = None
7 )

Parameters:

metrics

list

weights

list[float] | None

name

strDefaults to multi_metric_objective

reason

str | None

reason_builder

collections.abc.Callable[[list[opik.evaluation.metrics.score_result.ScoreResult], list[float], float], str | None] | None

PromptLibrary

1 PromptLibrary(
2     defaults: dict,
3     overrides: dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None = None
4 )

Parameters:

defaults

dict

Dictionary of default prompt templates

overrides

dict[str, str] | collections.abc.Callable[[opik_optimizer.utils.prompt_library.PromptLibrary], None] | None

Optional dict or callable to customize prompts

Methods

get

1 get(
2     key: str,
3     fmt: object
4 )

Parameters:

key

str

The prompt key to retrieve

fmt

object

get_default

1 get_default(
2     key: str
3 )

Parameters:

key

str

The prompt key to retrieve

keys

1 keys()

set

1 set(
2     key: str,
3     value: str
4 )

Parameters:

key

str

The prompt key to set

value

str

The new prompt template

update

1 update(
2     overrides: dict
3 )

Parameters:

overrides

dict

Dictionary of key-value pairs to update