Opik Agent Optimizer API Reference

The Opik Agent Optimizer SDK provides a comprehensive set of tools for optimizing LLM prompts and agents. This reference guide documents the standardized API that all optimizers follow, ensuring consistency and interoperability across different optimization algorithms.

Key Features

Standardized API: All optimizers follow the same interface for optimize_prompt() and optimize_mcp() methods
Multiple Algorithms: Support for various optimization strategies including evolutionary, few-shot, meta-prompt, MIPRO, and GEPA
MCP Support: Built-in support for Model Context Protocol tool calling
Consistent Results: All optimizers return standardized OptimizationResult objects
Counter Tracking: Built-in LLM and tool call counters for monitoring usage
Backward Compatibility: All original parameters preserved through kwargs extraction
Deprecation Warnings: Clear warnings for deprecated parameters with migration guidance

Core Classes

The SDK provides several optimizer classes that all inherit from BaseOptimizer and implement the same standardized interface:

ParameterOptimizer: Optimizes LLM call parameters (temperature, top_p, etc.) using Bayesian optimization
FewShotBayesianOptimizer: Uses few-shot learning with Bayesian optimization
MetaPromptOptimizer: Employs meta-prompting techniques for optimization
EvolutionaryOptimizer: Uses genetic algorithms for prompt evolution
GepaOptimizer: Leverages GEPA (Genetic-Pareto) optimization approach
HierarchicalReflectiveOptimizer: Uses hierarchical root cause analysis for targeted prompt refinement

Standardized Method Signatures

All optimizers implement these core methods with identical signatures:

optimize_prompt()

1 def optimize_prompt(
2     self,
3     prompt: ChatPrompt,
4     dataset: Dataset,
5     metric: Callable,
6     experiment_config: dict | None = None,
7     n_samples: int | None = None,
8     auto_continue: bool = False,
9     agent_class: type[OptimizableAgent] | None = None,
10     **kwargs: Any,
11 ) -> OptimizationResult

optimize_mcp()

1 def optimize_mcp(
2     self,
3     prompt: ChatPrompt,
4     dataset: Dataset,
5     metric: Callable,
6     *,
7     tool_name: str,
8     second_pass: Any,
9     experiment_config: dict | None = None,
10     n_samples: int | None = None,
11     auto_continue: bool = False,
12     agent_class: type[OptimizableAgent] | None = None,
13     fallback_invoker: Callable[[dict[str, Any]], str] | None = None,
14     fallback_arguments: Callable[[Any], dict[str, Any]] | None = None,
15     allow_tool_use_on_second_pass: bool = False,
16     **kwargs: Any,
17 ) -> OptimizationResult

Deprecation Warnings

The following parameters are deprecated and will be removed in future versions:

Constructor Parameters

project_name in optimizer constructors: Set project_name in the ChatPrompt instead
num_threads in optimizer constructors: Use n_threads instead

Example Migration

1 # ❌ Deprecated
2 optimizer = FewShotBayesianOptimizer(
3     model="gpt-4o-mini",
4     project_name="my-project",  # Deprecated
5     num_threads=16,             # Deprecated
6 )
7 
8 # ✅ Correct
9 optimizer = FewShotBayesianOptimizer(
10     model="gpt-4o-mini",
11     n_threads=16,  # Use n_threads instead
12 )
13 
14 prompt = ChatPrompt(
15     project_name="my-project",  # Set here instead
16     messages=[...]
17 )

ParameterOptimizer

1 ParameterOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     default_n_trials: int = 20,
5     local_search_ratio: float = 0.3,
6     local_search_scale: float = 0.2,
7     n_threads: int = 4,
8     verbose: int = 1,
9     seed: int = 42
10 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name (used for metadata, not for optimization calls)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

default_n_trials

intDefaults to 20

Default number of optimization trials to run

local_search_ratio

floatDefaults to 0.3

Ratio of trials to dedicate to local search refinement (0.0-1.0)

local_search_scale

floatDefaults to 0.2

Scale factor for narrowing search space during local search

n_threads

intDefaults to 4

Number of parallel threads for evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

Methods

cleanup

1 cleanup()

evaluate_prompt

1 evaluate_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     n_threads: int,
6     verbose: int = 1,
7     dataset_item_ids: list[str] | None = None,
8     experiment_config: dict | None = None,
9     n_samples: int | None = None,
10     seed: int | None = None,
11     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

n_threads

int

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | None

seed

int | None

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1 get_history()

get_optimizer_metadata

1 get_optimizer_metadata()

optimize_parameter

1 optimize_parameter(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     parameter_space: opik_optimizer.parameter_optimizer.parameter_search_space.ParameterSearchSpace | collections.abc.Mapping[str, typing.Any],
6     experiment_config: dict | None = None,
7     max_trials: int | None = None,
8     n_samples: int | None = None,
9     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
10     sampler: optuna.samplers._base.BaseSampler | None = None,
11     callbacks: list[collections.abc.Callable[[optuna.study.study.Study, optuna.trial._frozen.FrozenTrial], None]] | None = None,
12     timeout: float | None = None,
13     local_trials: int | None = None,
14     local_search_scale: float | None = None
15 )

Parameters:

prompt

ChatPrompt

The prompt to evaluate with tuned parameters

dataset

Dataset

Dataset providing evaluation examples

metric

Callable

Objective function to maximize

parameter_space

opik_optimizer.parameter_optimizer.parameter_search_space.ParameterSearchSpace | collections.abc.Mapping[str, typing.Any]

Definition of the search space for tunable parameters

experiment_config

dict | None

Optional experiment metadata

max_trials

int | None

Total number of trials (if None, uses default_n_trials)

n_samples

int | None

Number of dataset samples to evaluate per trial (None for all)

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

Optional custom agent class to execute evaluations

sampler

optuna.samplers._base.BaseSampler | None

Optuna sampler to use (default: TPESampler with seed)

callbacks

list[collections.abc.Callable[[optuna.study.study.Study, optuna.trial._frozen.FrozenTrial], None]] | None

List of callback functions for Optuna study

timeout

float | None

Maximum time in seconds for optimization

local_trials

int | None

Number of trials for local search (overrides local_search_ratio)

local_search_scale

float | None

Scale factor for local search narrowing (0.0-1.0)

FewShotBayesianOptimizer

1 FewShotBayesianOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     min_examples: int = 2,
5     max_examples: int = 8,
6     n_threads: int = 8,
7     verbose: int = 1,
8     seed: int = 42
9 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for optimizer’s internal reasoning (generating few-shot templates)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

min_examples

intDefaults to 2

Minimum number of examples to include in the prompt

max_examples

intDefaults to 8

Maximum number of examples to include in the prompt

n_threads

intDefaults to 8

Number of threads for parallel evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

Methods

cleanup

1 cleanup()

evaluate_prompt

1 evaluate_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     n_threads: int,
6     verbose: int = 1,
7     dataset_item_ids: list[str] | None = None,
8     experiment_config: dict | None = None,
9     n_samples: int | None = None,
10     seed: int | None = None,
11     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

n_threads

int

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | None

seed

int | None

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1 get_history()

get_optimizer_metadata

1 get_optimizer_metadata()

optimize_prompt

1 optimize_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     experiment_config: dict | None = None,
6     n_samples: int | None = None,
7     auto_continue: bool = False,
8     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9     project_name: str = 'Optimization',
10     max_trials: int = 10,
11     args: Any,
12     kwargs: Any
13 )

Parameters:

prompt

ChatPrompt

The prompt to optimize

dataset

Dataset

Opik Dataset to optimize on

metric

Callable

Metric function to evaluate on

experiment_config

dict | None

Optional configuration for the experiment, useful to log additional metadata

n_samples

int | None

Optional number of items to test in the dataset

auto_continue

boolDefaults to False

Whether to auto-continue optimization

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

Optional agent class to use

project_name

strDefaults to Optimization

Opik project name for logging traces (default: “Optimization”)

max_trials

intDefaults to 10

Number of trials for Bayesian Optimization (default: 10)

args

Any

kwargs

Any

MetaPromptOptimizer

1 MetaPromptOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     prompts_per_round: int = 4,
5     enable_context: bool = True,
6     n_threads: int = 12,
7     verbose: int = 1,
8     seed: int = 42
9 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for optimizer’s internal reasoning/generation calls

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

prompts_per_round

intDefaults to 4

Number of candidate prompts to generate per optimization round

enable_context

boolDefaults to True

Whether to include task-specific context when reasoning about improvements

n_threads

intDefaults to 12

Number of parallel threads for prompt evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

Methods

cleanup

1 cleanup()

evaluate_prompt

1 evaluate_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     n_threads: int,
6     verbose: int = 1,
7     dataset_item_ids: list[str] | None = None,
8     experiment_config: dict | None = None,
9     n_samples: int | None = None,
10     seed: int | None = None,
11     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

n_threads

int

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | None

seed

int | None

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1 get_history()

get_optimizer_metadata

1 get_optimizer_metadata()

optimize_mcp

1 optimize_mcp(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     tool_name: str,
6     second_pass: MCPSecondPassCoordinator,
7     experiment_config: dict | None = None,
8     n_samples: int | None = None,
9     auto_continue: bool = False,
10     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
11     fallback_invoker: collections.abc.Callable[[dict[str, typing.Any]], str] | None = None,
12     fallback_arguments: collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None = None,
13     allow_tool_use_on_second_pass: bool = False,
14     kwargs: Any
15 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

tool_name

str

second_pass

MCPSecondPassCoordinator

experiment_config

dict | None

n_samples

int | None

auto_continue

boolDefaults to False

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

fallback_invoker

collections.abc.Callable[[dict[str, typing.Any]], str] | None

fallback_arguments

collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None

allow_tool_use_on_second_pass

boolDefaults to False

kwargs

Any

optimize_prompt

1 optimize_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     experiment_config: dict | None = None,
6     n_samples: int | None = None,
7     auto_continue: bool = False,
8     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9     project_name: str = 'Optimization',
10     max_trials: int = 10,
11     mcp_config: opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None = None,
12     candidate_generator: collections.abc.Callable[..., list[opik_optimizer.optimization_config.chat_prompt.ChatPrompt]] | None = None,
13     candidate_generator_kwargs: dict[str, typing.Any] | None = None,
14     args: Any,
15     kwargs: Any
16 )

Parameters:

prompt

ChatPrompt

The ChatPrompt to optimize. Can include system/user/assistant messages, tools, and model configuration.

dataset

Dataset

Opik Dataset containing evaluation examples. Each item is passed to the prompt during evaluation.

metric

Callable

Evaluation function that takes (dataset_item, llm_output) and returns a score (float). Higher scores indicate better performance.

experiment_config

dict | None

Optional metadata dictionary to log with Opik experiments. Useful for tracking experiment parameters and context.

n_samples

int | None

Number of dataset items to use per evaluation. If None, uses full dataset. Lower values speed up optimization but may be less reliable.

auto_continue

boolDefaults to False

If True, optimizer may continue beyond max_trials if improvements are still being found.

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

Custom agent class for prompt execution. If None, uses default LiteLLM-based agent. Must inherit from OptimizableAgent.

project_name

strDefaults to Optimization

Opik project name for logging traces and experiments. Default: “Optimization”

max_trials

intDefaults to 10

Maximum total number of prompts to evaluate across all rounds. Optimizer stops when this limit is reached.

mcp_config

opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None

Optional MCP (Model Context Protocol) execution configuration for prompts that use external tools. Enables tool-calling workflows. Default: None

candidate_generator

collections.abc.Callable[..., list[opik_optimizer.optimization_config.chat_prompt.ChatPrompt]] | None

Optional custom function to generate candidate prompts. Overrides default meta-reasoning generator. Should return list[ChatPrompt].

candidate_generator_kwargs

dict[str, typing.Any] | None

Optional kwargs to pass to candidate_generator.

args

Any

kwargs

Any

EvolutionaryOptimizer

1 EvolutionaryOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     population_size: int = 30,
5     num_generations: int = 15,
6     mutation_rate: float = 0.2,
7     crossover_rate: float = 0.8,
8     tournament_size: int = 4,
9     elitism_size: int = 3,
10     adaptive_mutation: bool = True,
11     enable_moo: bool = True,
12     enable_llm_crossover: bool = True,
13     output_style_guidance: str | None = None,
14     infer_output_style: bool = False,
15     n_threads: int = 12,
16     verbose: int = 1,
17     seed: int = 42
18 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for optimizer’s internal operations (mutations, crossover, etc.)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

population_size

intDefaults to 30

Number of prompts in the population

num_generations

intDefaults to 15

Number of generations to run

mutation_rate

floatDefaults to 0.2

Mutation rate for genetic operations

crossover_rate

floatDefaults to 0.8

Crossover rate for genetic operations

tournament_size

intDefaults to 4

Tournament size for selection

elitism_size

intDefaults to 3

Number of elite prompts to preserve across generations

adaptive_mutation

boolDefaults to True

Whether to use adaptive mutation that adjusts based on population diversity

enable_moo

boolDefaults to True

Whether to enable multi-objective optimization (optimizes metric and prompt length)

enable_llm_crossover

boolDefaults to True

Whether to enable LLM-based crossover operations

output_style_guidance

str | None

Optional guidance for output style in generated prompts

infer_output_style

boolDefaults to False

Whether to automatically infer output style from the dataset

n_threads

intDefaults to 12

Number of threads for parallel evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

Methods

cleanup

1 cleanup()

evaluate_prompt

1 evaluate_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     n_threads: int,
6     verbose: int = 1,
7     dataset_item_ids: list[str] | None = None,
8     experiment_config: dict | None = None,
9     n_samples: int | None = None,
10     seed: int | None = None,
11     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

n_threads

int

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | None

seed

int | None

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1 get_history()

get_optimizer_metadata

1 get_optimizer_metadata()

optimize_mcp

1 optimize_mcp(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     tool_name: str,
6     second_pass: MCPSecondPassCoordinator,
7     experiment_config: dict | None = None,
8     n_samples: int | None = None,
9     auto_continue: bool = False,
10     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
11     fallback_invoker: collections.abc.Callable[[dict[str, typing.Any]], str] | None = None,
12     fallback_arguments: collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None = None,
13     allow_tool_use_on_second_pass: bool = False,
14     kwargs: Any
15 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

tool_name

str

second_pass

MCPSecondPassCoordinator

experiment_config

dict | None

n_samples

int | None

auto_continue

boolDefaults to False

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

fallback_invoker

collections.abc.Callable[[dict[str, typing.Any]], str] | None

fallback_arguments

collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None

allow_tool_use_on_second_pass

boolDefaults to False

kwargs

Any

optimize_prompt

1 optimize_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     experiment_config: dict | None = None,
6     n_samples: int | None = None,
7     auto_continue: bool = False,
8     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9     project_name: str = 'Optimization',
10     max_trials: int = 10,
11     mcp_config: opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None = None,
12     args: Any,
13     kwargs: Any
14 )

Parameters:

prompt

ChatPrompt

The prompt to optimize

dataset

Dataset

The dataset to use for evaluation

metric

Callable

Metric function to optimize with, should have the arguments dataset_item and llm_output

experiment_config

dict | None

Optional experiment configuration

n_samples

int | None

Optional number of samples to use

auto_continue

boolDefaults to False

Whether to automatically continue optimization

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

Optional agent class to use

project_name

strDefaults to Optimization

Opik project name for logging traces (default: “Optimization”)

max_trials

intDefaults to 10

mcp_config

opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None

MCP tool calling configuration (default: None)

args

Any

kwargs

Any

GepaOptimizer

1 GepaOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     n_threads: int = 6,
5     verbose: int = 1,
6     seed: int = 42
7 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for the optimization algorithm

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

n_threads

intDefaults to 6

Number of parallel threads for evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

Methods

cleanup

1 cleanup()

evaluate_prompt

1 evaluate_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     n_threads: int,
6     verbose: int = 1,
7     dataset_item_ids: list[str] | None = None,
8     experiment_config: dict | None = None,
9     n_samples: int | None = None,
10     seed: int | None = None,
11     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

n_threads

int

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | None

seed

int | None

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1 get_history()

get_optimizer_metadata

1 get_optimizer_metadata()

optimize_prompt

1 optimize_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     experiment_config: dict | None = None,
6     n_samples: int | None = None,
7     auto_continue: bool = False,
8     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9     project_name: str = 'Optimization',
10     max_trials: int = 10,
11     reflection_minibatch_size: int = 3,
12     candidate_selection_strategy: str = 'pareto',
13     skip_perfect_score: bool = True,
14     perfect_score: float = 1.0,
15     use_merge: bool = False,
16     max_merge_invocations: int = 5,
17     run_dir: str | None = None,
18     track_best_outputs: bool = False,
19     display_progress_bar: bool = False,
20     seed: int = 42,
21     raise_on_exception: bool = True
22 )

Parameters:

prompt

ChatPrompt

The prompt to optimize

dataset

Dataset

Opik Dataset to optimize on

metric

Callable

Metric function to evaluate on

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | None

Optional number of items to test in the dataset

auto_continue

boolDefaults to False

Whether to auto-continue optimization

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

Optional agent class to use

project_name

strDefaults to Optimization

max_trials

intDefaults to 10

Maximum number of different prompts to test (default: 10)

reflection_minibatch_size

intDefaults to 3

Size of reflection minibatches (default: 3)

candidate_selection_strategy

strDefaults to pareto

Strategy for candidate selection (default: “pareto”)

skip_perfect_score

boolDefaults to True

Skip candidates with perfect scores (default: True)

perfect_score

floatDefaults to 1.0

Score considered perfect (default: 1.0)

use_merge

boolDefaults to False

Enable merge operations (default: False)

max_merge_invocations

intDefaults to 5

Maximum merge invocations (default: 5)

run_dir

str | None

Directory for run outputs (default: None)

track_best_outputs

boolDefaults to False

Track best outputs during optimization (default: False)

display_progress_bar

boolDefaults to False

Display progress bar (default: False)

seed

intDefaults to 42

Random seed for reproducibility (default: 42)

raise_on_exception

boolDefaults to True

Raise exceptions instead of continuing (default: True)

HierarchicalReflectiveOptimizer

1 HierarchicalReflectiveOptimizer(
2     model: str = 'gpt-4o',
3     model_parameters: dict[str, typing.Any] | None = None,
4     max_parallel_batches: int = 5,
5     batch_size: int = 25,
6     convergence_threshold: float = 0.01,
7     n_threads: int = 12,
8     verbose: int = 1,
9     seed: int = 42
10 )

Parameters:

model

strDefaults to gpt-4o

LiteLLM model name for the optimization algorithm (reasoning and analysis)

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.

max_parallel_batches

intDefaults to 5

Maximum number of batches to process concurrently during hierarchical root cause analysis

batch_size

intDefaults to 25

Number of test cases per batch for root cause analysis

convergence_threshold

floatDefaults to 0.01

Stop if relative improvement is below this threshold

n_threads

intDefaults to 12

Number of parallel threads for evaluation

verbose

intDefaults to 1

Controls internal logging/progress bars (0=off, 1=on)

seed

intDefaults to 42

Random seed for reproducibility

Methods

cleanup

1 cleanup()

evaluate_prompt

1 evaluate_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     n_threads: int,
6     verbose: int = 1,
7     dataset_item_ids: list[str] | None = None,
8     experiment_config: dict | None = None,
9     n_samples: int | None = None,
10     seed: int | None = None,
11     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12 )

Parameters:

prompt

ChatPrompt

dataset

Dataset

metric

Callable

n_threads

int

verbose

intDefaults to 1

dataset_item_ids

list[str] | None

experiment_config

dict | None

n_samples

int | None

seed

int | None

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1 get_history()

get_optimizer_metadata

1 get_optimizer_metadata()

optimize_prompt

1 optimize_prompt(
2     prompt: ChatPrompt,
3     dataset: Dataset,
4     metric: Callable,
5     experiment_config: dict | None = None,
6     n_samples: int | None = None,
7     auto_continue: bool = False,
8     agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9     project_name: str = 'Optimization',
10     max_trials: int = 5,
11     max_retries: int = 2,
12     args: Any,
13     kwargs: Any
14 )

Parameters:

prompt

ChatPrompt

the prompt to optimize

dataset

Dataset

Opik dataset name, or Opik dataset

metric

Callable

A metric function, this function should have two arguments: dataset_item and llm_output

experiment_config

dict | None

Optional configuration for the experiment

n_samples

int | None

auto_continue

boolDefaults to False

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

project_name

strDefaults to Optimization

Opik project name for logging traces (default: “Optimization”)

max_trials

intDefaults to 5

max_retries

intDefaults to 2

args

Any

kwargs

Any

ChatPrompt

1 ChatPrompt(
2     name: str = 'chat-prompt',
3     system: str | None = None,
4     user: str | None = None,
5     messages: list[dict[str, str]] | None = None,
6     tools: list[dict[str, typing.Any]] | None = None,
7     function_map: dict[str, collections.abc.Callable] | None = None,
8     model: str = 'gpt-4o-mini',
9     invoke: collections.abc.Callable | None = None,
10     model_parameters: dict[str, typing.Any] | None = None
11 )

Parameters:

name

strDefaults to chat-prompt

system

str | None

the system prompt

user

str | None

messages

list[dict[str, str]] | None

a list of dictionaries with role/content, with a content containing {input-dataset-field}

tools

list[dict[str, typing.Any]] | None

function_map

dict[str, collections.abc.Callable] | None

model

strDefaults to gpt-4o-mini

invoke

collections.abc.Callable | None

model_parameters

dict[str, typing.Any] | None

Methods

copy

1 copy()

get_messages

1 get_messages(
2     dataset_item: dict[str, str] | None = None
3 )

Parameters:

dataset_item

dict[str, str] | None

set_messages

1 set_messages(
2     messages: list
3 )

Parameters:

messages

list

to_dict

1 to_dict()

with_messages

1 with_messages(
2     messages: list
3 )

Parameters:

messages

list

OptimizationResult

1 OptimizationResult(
2     optimizer: <class 'str'> = 'Optimizer',
3     prompt: list[dict[str, str]],
4     score: <class 'float'>,
5     metric_name: <class 'str'>,
6     optimization_id: str | None = None,
7     dataset_id: str | None = None,
8     initial_prompt: list[dict[str, str]] | None = None,
9     initial_score: float | None = None,
10     details: dict[str, Any] = PydanticUndefined,
11     history: list[dict[str, Any]] = [],
12     llm_calls: int | None = None,
13     tool_calls: int | None = None,
14     demonstrations: list[dict[str, Any]] | None = None,
15     mipro_prompt: str | None = None,
16     tool_prompts: dict[str, str] | None = None
17 )

Parameters:

optimizer

<class 'str'>Defaults to Optimizer

prompt

list[dict[str, str]]Defaults to PydanticUndefined

score

<class 'float'>Defaults to PydanticUndefined

metric_name

<class 'str'>Defaults to PydanticUndefined

optimization_id

str | None

dataset_id

str | None

initial_prompt

list[dict[str, str]] | None

initial_score

float | None

details

dict[str, Any]Defaults to PydanticUndefined

history

list[dict[str, Any]]Defaults to []

llm_calls

int | None

tool_calls

int | None

demonstrations

list[dict[str, Any]] | None

mipro_prompt

str | None

tool_prompts

dict[str, str] | None

OptimizableAgent

1 OptimizableAgent(
2     prompt: Any,
3     project_name: Any = None
4 )

Parameters:

prompt

Any

project_name

Any

Methods

init_agent

1 init_agent(
2     prompt: Any
3 )

Parameters:

prompt

Any

init_llm

1 init_llm()

invoke

1 invoke(
2     messages: list,
3     seed: int | None = None
4 )

Parameters:

messages

list

seed

int | None

invoke_dataset_item

1 invoke_dataset_item(
2     dataset_item: dict
3 )

Parameters:

dataset_item

dict

llm_invoke

1 llm_invoke(
2     query: str | None = None,
3     messages: list[dict[str, str]] | None = None,
4     seed: int | None = None,
5     allow_tool_use: bool | None = False
6 )

Parameters:

query

str | None

messages

list[dict[str, str]] | None

seed

int | None

allow_tool_use

bool | NoneDefaults to False