Opik Agent Optimizer API Reference

Technical SDK reference guide

The Opik Agent Optimizer SDK provides a comprehensive set of tools for optimizing LLM prompts and agents. This reference guide documents the standardized API that all optimizers follow, ensuring consistency and interoperability across different optimization algorithms.

Key Features

  • Standardized API: All optimizers follow the same interface for optimize_prompt() and optimize_mcp() methods
  • Multiple Algorithms: Support for various optimization strategies including evolutionary, few-shot, meta-prompt, MIPRO, and GEPA
  • MCP Support: Built-in support for Model Context Protocol tool calling
  • Consistent Results: All optimizers return standardized OptimizationResult objects
  • Counter Tracking: Built-in LLM and tool call counters for monitoring usage
  • Backward Compatibility: All original parameters preserved through kwargs extraction
  • Deprecation Warnings: Clear warnings for deprecated parameters with migration guidance

Core Classes

The SDK provides several optimizer classes that all inherit from BaseOptimizer and implement the same standardized interface:

  • ParameterOptimizer: Optimizes LLM call parameters (temperature, top_p, etc.) using Bayesian optimization
  • FewShotBayesianOptimizer: Uses few-shot learning with Bayesian optimization
  • MetaPromptOptimizer: Employs meta-prompting techniques for optimization
  • EvolutionaryOptimizer: Uses genetic algorithms for prompt evolution
  • GepaOptimizer: Leverages GEPA (Genetic-Pareto) optimization approach
  • HierarchicalReflectiveOptimizer: Uses hierarchical root cause analysis for targeted prompt refinement

Standardized Method Signatures

All optimizers implement these core methods with identical signatures:

optimize_prompt()

1def optimize_prompt(
2 self,
3 prompt: ChatPrompt,
4 dataset: Dataset,
5 metric: Callable,
6 experiment_config: dict | None = None,
7 n_samples: int | None = None,
8 auto_continue: bool = False,
9 agent_class: type[OptimizableAgent] | None = None,
10 **kwargs: Any,
11) -> OptimizationResult

optimize_mcp()

1def optimize_mcp(
2 self,
3 prompt: ChatPrompt,
4 dataset: Dataset,
5 metric: Callable,
6 *,
7 tool_name: str,
8 second_pass: Any,
9 experiment_config: dict | None = None,
10 n_samples: int | None = None,
11 auto_continue: bool = False,
12 agent_class: type[OptimizableAgent] | None = None,
13 fallback_invoker: Callable[[dict[str, Any]], str] | None = None,
14 fallback_arguments: Callable[[Any], dict[str, Any]] | None = None,
15 allow_tool_use_on_second_pass: bool = False,
16 **kwargs: Any,
17) -> OptimizationResult

Deprecation Warnings

The following parameters are deprecated and will be removed in future versions:

Constructor Parameters

  • project_name in optimizer constructors: Set project_name in the ChatPrompt instead
  • num_threads in optimizer constructors: Use n_threads instead

Example Migration

1# ❌ Deprecated
2optimizer = FewShotBayesianOptimizer(
3 model="gpt-4o-mini",
4 project_name="my-project", # Deprecated
5 num_threads=16, # Deprecated
6)
7
8# ✅ Correct
9optimizer = FewShotBayesianOptimizer(
10 model="gpt-4o-mini",
11 n_threads=16, # Use n_threads instead
12)
13
14prompt = ChatPrompt(
15 project_name="my-project", # Set here instead
16 messages=[...]
17)

ParameterOptimizer

1ParameterOptimizer(
2 model: str = 'gpt-4o',
3 model_parameters: dict[str, typing.Any] | None = None,
4 default_n_trials: int = 20,
5 local_search_ratio: float = 0.3,
6 local_search_scale: float = 0.2,
7 n_threads: int = 4,
8 verbose: int = 1,
9 seed: int = 42
10)

Parameters:

model
strDefaults to gpt-4o
LiteLLM model name (used for metadata, not for optimization calls)
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.
default_n_trials
intDefaults to 20
Default number of optimization trials to run
local_search_ratio
floatDefaults to 0.3
Ratio of trials to dedicate to local search refinement (0.0-1.0)
local_search_scale
floatDefaults to 0.2
Scale factor for narrowing search space during local search
n_threads
intDefaults to 4
Number of parallel threads for evaluation
verbose
intDefaults to 1
Controls internal logging/progress bars (0=off, 1=on)
seed
intDefaults to 42
Random seed for reproducibility

Methods

cleanup

1cleanup()

evaluate_prompt

1evaluate_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 n_threads: int,
6 verbose: int = 1,
7 dataset_item_ids: list[str] | None = None,
8 experiment_config: dict | None = None,
9 n_samples: int | None = None,
10 seed: int | None = None,
11 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
n_threads
int
verbose
intDefaults to 1
dataset_item_ids
list[str] | None
experiment_config
dict | None
n_samples
int | None
seed
int | None
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1get_history()

get_optimizer_metadata

1get_optimizer_metadata()

optimize_parameter

1optimize_parameter(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 parameter_space: opik_optimizer.parameter_optimizer.parameter_search_space.ParameterSearchSpace | collections.abc.Mapping[str, typing.Any],
6 experiment_config: dict | None = None,
7 max_trials: int | None = None,
8 n_samples: int | None = None,
9 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
10 sampler: optuna.samplers._base.BaseSampler | None = None,
11 callbacks: list[collections.abc.Callable[[optuna.study.study.Study, optuna.trial._frozen.FrozenTrial], None]] | None = None,
12 timeout: float | None = None,
13 local_trials: int | None = None,
14 local_search_scale: float | None = None
15)

Parameters:

prompt
ChatPrompt
The prompt to evaluate with tuned parameters
dataset
Dataset
Dataset providing evaluation examples
metric
Callable
Objective function to maximize
parameter_space
opik_optimizer.parameter_optimizer.parameter_search_space.ParameterSearchSpace | collections.abc.Mapping[str, typing.Any]
Definition of the search space for tunable parameters
experiment_config
dict | None
Optional experiment metadata
max_trials
int | None
Total number of trials (if None, uses default_n_trials)
n_samples
int | None
Number of dataset samples to evaluate per trial (None for all)
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
Optional custom agent class to execute evaluations
sampler
optuna.samplers._base.BaseSampler | None
Optuna sampler to use (default: TPESampler with seed)
callbacks
list[collections.abc.Callable[[optuna.study.study.Study, optuna.trial._frozen.FrozenTrial], None]] | None
List of callback functions for Optuna study
timeout
float | None
Maximum time in seconds for optimization
local_trials
int | None
Number of trials for local search (overrides local_search_ratio)
local_search_scale
float | None
Scale factor for local search narrowing (0.0-1.0)

FewShotBayesianOptimizer

1FewShotBayesianOptimizer(
2 model: str = 'gpt-4o',
3 model_parameters: dict[str, typing.Any] | None = None,
4 min_examples: int = 2,
5 max_examples: int = 8,
6 n_threads: int = 8,
7 verbose: int = 1,
8 seed: int = 42
9)

Parameters:

model
strDefaults to gpt-4o
LiteLLM model name for optimizer’s internal reasoning (generating few-shot templates)
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.
min_examples
intDefaults to 2
Minimum number of examples to include in the prompt
max_examples
intDefaults to 8
Maximum number of examples to include in the prompt
n_threads
intDefaults to 8
Number of threads for parallel evaluation
verbose
intDefaults to 1
Controls internal logging/progress bars (0=off, 1=on)
seed
intDefaults to 42
Random seed for reproducibility

Methods

cleanup

1cleanup()

evaluate_prompt

1evaluate_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 n_threads: int,
6 verbose: int = 1,
7 dataset_item_ids: list[str] | None = None,
8 experiment_config: dict | None = None,
9 n_samples: int | None = None,
10 seed: int | None = None,
11 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
n_threads
int
verbose
intDefaults to 1
dataset_item_ids
list[str] | None
experiment_config
dict | None
n_samples
int | None
seed
int | None
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1get_history()

get_optimizer_metadata

1get_optimizer_metadata()

optimize_prompt

1optimize_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 experiment_config: dict | None = None,
6 n_samples: int | None = None,
7 auto_continue: bool = False,
8 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9 project_name: str = 'Optimization',
10 max_trials: int = 10,
11 args: Any,
12 kwargs: Any
13)

Parameters:

prompt
ChatPrompt
The prompt to optimize
dataset
Dataset
Opik Dataset to optimize on
metric
Callable
Metric function to evaluate on
experiment_config
dict | None
Optional configuration for the experiment, useful to log additional metadata
n_samples
int | None
Optional number of items to test in the dataset
auto_continue
boolDefaults to False
Whether to auto-continue optimization
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
Optional agent class to use
project_name
strDefaults to Optimization
Opik project name for logging traces (default: “Optimization”)
max_trials
intDefaults to 10
Number of trials for Bayesian Optimization (default: 10)
args
Any
kwargs
Any

MetaPromptOptimizer

1MetaPromptOptimizer(
2 model: str = 'gpt-4o',
3 model_parameters: dict[str, typing.Any] | None = None,
4 prompts_per_round: int = 4,
5 enable_context: bool = True,
6 n_threads: int = 12,
7 verbose: int = 1,
8 seed: int = 42
9)

Parameters:

model
strDefaults to gpt-4o
LiteLLM model name for optimizer’s internal reasoning/generation calls
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.
prompts_per_round
intDefaults to 4
Number of candidate prompts to generate per optimization round
enable_context
boolDefaults to True
Whether to include task-specific context when reasoning about improvements
n_threads
intDefaults to 12
Number of parallel threads for prompt evaluation
verbose
intDefaults to 1
Controls internal logging/progress bars (0=off, 1=on)
seed
intDefaults to 42
Random seed for reproducibility

Methods

cleanup

1cleanup()

evaluate_prompt

1evaluate_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 n_threads: int,
6 verbose: int = 1,
7 dataset_item_ids: list[str] | None = None,
8 experiment_config: dict | None = None,
9 n_samples: int | None = None,
10 seed: int | None = None,
11 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
n_threads
int
verbose
intDefaults to 1
dataset_item_ids
list[str] | None
experiment_config
dict | None
n_samples
int | None
seed
int | None
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1get_history()

get_optimizer_metadata

1get_optimizer_metadata()

optimize_mcp

1optimize_mcp(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 tool_name: str,
6 second_pass: MCPSecondPassCoordinator,
7 experiment_config: dict | None = None,
8 n_samples: int | None = None,
9 auto_continue: bool = False,
10 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
11 fallback_invoker: collections.abc.Callable[[dict[str, typing.Any]], str] | None = None,
12 fallback_arguments: collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None = None,
13 allow_tool_use_on_second_pass: bool = False,
14 kwargs: Any
15)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
tool_name
str
second_pass
MCPSecondPassCoordinator
experiment_config
dict | None
n_samples
int | None
auto_continue
boolDefaults to False
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
fallback_invoker
collections.abc.Callable[[dict[str, typing.Any]], str] | None
fallback_arguments
collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None
allow_tool_use_on_second_pass
boolDefaults to False
kwargs
Any

optimize_prompt

1optimize_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 experiment_config: dict | None = None,
6 n_samples: int | None = None,
7 auto_continue: bool = False,
8 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9 project_name: str = 'Optimization',
10 max_trials: int = 10,
11 mcp_config: opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None = None,
12 candidate_generator: collections.abc.Callable[..., list[opik_optimizer.optimization_config.chat_prompt.ChatPrompt]] | None = None,
13 candidate_generator_kwargs: dict[str, typing.Any] | None = None,
14 args: Any,
15 kwargs: Any
16)

Parameters:

prompt
ChatPrompt
The ChatPrompt to optimize. Can include system/user/assistant messages, tools, and model configuration.
dataset
Dataset
Opik Dataset containing evaluation examples. Each item is passed to the prompt during evaluation.
metric
Callable
Evaluation function that takes (dataset_item, llm_output) and returns a score (float). Higher scores indicate better performance.
experiment_config
dict | None
Optional metadata dictionary to log with Opik experiments. Useful for tracking experiment parameters and context.
n_samples
int | None
Number of dataset items to use per evaluation. If None, uses full dataset. Lower values speed up optimization but may be less reliable.
auto_continue
boolDefaults to False
If True, optimizer may continue beyond max_trials if improvements are still being found.
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
Custom agent class for prompt execution. If None, uses default LiteLLM-based agent. Must inherit from OptimizableAgent.
project_name
strDefaults to Optimization
Opik project name for logging traces and experiments. Default: “Optimization”
max_trials
intDefaults to 10
Maximum total number of prompts to evaluate across all rounds. Optimizer stops when this limit is reached.
mcp_config
opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None
Optional MCP (Model Context Protocol) execution configuration for prompts that use external tools. Enables tool-calling workflows. Default: None
candidate_generator
collections.abc.Callable[..., list[opik_optimizer.optimization_config.chat_prompt.ChatPrompt]] | None
Optional custom function to generate candidate prompts. Overrides default meta-reasoning generator. Should return list[ChatPrompt].
candidate_generator_kwargs
dict[str, typing.Any] | None
Optional kwargs to pass to candidate_generator.
args
Any
kwargs
Any

EvolutionaryOptimizer

1EvolutionaryOptimizer(
2 model: str = 'gpt-4o',
3 model_parameters: dict[str, typing.Any] | None = None,
4 population_size: int = 30,
5 num_generations: int = 15,
6 mutation_rate: float = 0.2,
7 crossover_rate: float = 0.8,
8 tournament_size: int = 4,
9 elitism_size: int = 3,
10 adaptive_mutation: bool = True,
11 enable_moo: bool = True,
12 enable_llm_crossover: bool = True,
13 output_style_guidance: str | None = None,
14 infer_output_style: bool = False,
15 n_threads: int = 12,
16 verbose: int = 1,
17 seed: int = 42
18)

Parameters:

model
strDefaults to gpt-4o
LiteLLM model name for optimizer’s internal operations (mutations, crossover, etc.)
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.
population_size
intDefaults to 30
Number of prompts in the population
num_generations
intDefaults to 15
Number of generations to run
mutation_rate
floatDefaults to 0.2
Mutation rate for genetic operations
crossover_rate
floatDefaults to 0.8
Crossover rate for genetic operations
tournament_size
intDefaults to 4
Tournament size for selection
elitism_size
intDefaults to 3
Number of elite prompts to preserve across generations
adaptive_mutation
boolDefaults to True
Whether to use adaptive mutation that adjusts based on population diversity
enable_moo
boolDefaults to True
Whether to enable multi-objective optimization (optimizes metric and prompt length)
enable_llm_crossover
boolDefaults to True
Whether to enable LLM-based crossover operations
output_style_guidance
str | None
Optional guidance for output style in generated prompts
infer_output_style
boolDefaults to False
Whether to automatically infer output style from the dataset
n_threads
intDefaults to 12
Number of threads for parallel evaluation
verbose
intDefaults to 1
Controls internal logging/progress bars (0=off, 1=on)
seed
intDefaults to 42
Random seed for reproducibility

Methods

cleanup

1cleanup()

evaluate_prompt

1evaluate_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 n_threads: int,
6 verbose: int = 1,
7 dataset_item_ids: list[str] | None = None,
8 experiment_config: dict | None = None,
9 n_samples: int | None = None,
10 seed: int | None = None,
11 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
n_threads
int
verbose
intDefaults to 1
dataset_item_ids
list[str] | None
experiment_config
dict | None
n_samples
int | None
seed
int | None
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1get_history()

get_optimizer_metadata

1get_optimizer_metadata()

optimize_mcp

1optimize_mcp(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 tool_name: str,
6 second_pass: MCPSecondPassCoordinator,
7 experiment_config: dict | None = None,
8 n_samples: int | None = None,
9 auto_continue: bool = False,
10 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
11 fallback_invoker: collections.abc.Callable[[dict[str, typing.Any]], str] | None = None,
12 fallback_arguments: collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None = None,
13 allow_tool_use_on_second_pass: bool = False,
14 kwargs: Any
15)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
tool_name
str
second_pass
MCPSecondPassCoordinator
experiment_config
dict | None
n_samples
int | None
auto_continue
boolDefaults to False
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
fallback_invoker
collections.abc.Callable[[dict[str, typing.Any]], str] | None
fallback_arguments
collections.abc.Callable[[typing.Any], dict[str, typing.Any]] | None
allow_tool_use_on_second_pass
boolDefaults to False
kwargs
Any

optimize_prompt

1optimize_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 experiment_config: dict | None = None,
6 n_samples: int | None = None,
7 auto_continue: bool = False,
8 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9 project_name: str = 'Optimization',
10 max_trials: int = 10,
11 mcp_config: opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None = None,
12 args: Any,
13 kwargs: Any
14)

Parameters:

prompt
ChatPrompt
The prompt to optimize
dataset
Dataset
The dataset to use for evaluation
metric
Callable
Metric function to optimize with, should have the arguments dataset_item and llm_output
experiment_config
dict | None
Optional experiment configuration
n_samples
int | None
Optional number of samples to use
auto_continue
boolDefaults to False
Whether to automatically continue optimization
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
Optional agent class to use
project_name
strDefaults to Optimization
Opik project name for logging traces (default: “Optimization”)
max_trials
intDefaults to 10
mcp_config
opik_optimizer.mcp_utils.mcp_workflow.MCPExecutionConfig | None
MCP tool calling configuration (default: None)
args
Any
kwargs
Any

GepaOptimizer

1GepaOptimizer(
2 model: str = 'gpt-4o',
3 model_parameters: dict[str, typing.Any] | None = None,
4 n_threads: int = 6,
5 verbose: int = 1,
6 seed: int = 42
7)

Parameters:

model
strDefaults to gpt-4o
LiteLLM model name for the optimization algorithm
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.
n_threads
intDefaults to 6
Number of parallel threads for evaluation
verbose
intDefaults to 1
Controls internal logging/progress bars (0=off, 1=on)
seed
intDefaults to 42
Random seed for reproducibility

Methods

cleanup

1cleanup()

evaluate_prompt

1evaluate_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 n_threads: int,
6 verbose: int = 1,
7 dataset_item_ids: list[str] | None = None,
8 experiment_config: dict | None = None,
9 n_samples: int | None = None,
10 seed: int | None = None,
11 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
n_threads
int
verbose
intDefaults to 1
dataset_item_ids
list[str] | None
experiment_config
dict | None
n_samples
int | None
seed
int | None
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1get_history()

get_optimizer_metadata

1get_optimizer_metadata()

optimize_prompt

1optimize_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 experiment_config: dict | None = None,
6 n_samples: int | None = None,
7 auto_continue: bool = False,
8 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9 project_name: str = 'Optimization',
10 max_trials: int = 10,
11 reflection_minibatch_size: int = 3,
12 candidate_selection_strategy: str = 'pareto',
13 skip_perfect_score: bool = True,
14 perfect_score: float = 1.0,
15 use_merge: bool = False,
16 max_merge_invocations: int = 5,
17 run_dir: str | None = None,
18 track_best_outputs: bool = False,
19 display_progress_bar: bool = False,
20 seed: int = 42,
21 raise_on_exception: bool = True
22)

Parameters:

prompt
ChatPrompt
The prompt to optimize
dataset
Dataset
Opik Dataset to optimize on
metric
Callable
Metric function to evaluate on
experiment_config
dict | None
Optional configuration for the experiment
n_samples
int | None
Optional number of items to test in the dataset
auto_continue
boolDefaults to False
Whether to auto-continue optimization
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
Optional agent class to use
project_name
strDefaults to Optimization
max_trials
intDefaults to 10
Maximum number of different prompts to test (default: 10)
reflection_minibatch_size
intDefaults to 3
Size of reflection minibatches (default: 3)
candidate_selection_strategy
strDefaults to pareto
Strategy for candidate selection (default: “pareto”)
skip_perfect_score
boolDefaults to True
Skip candidates with perfect scores (default: True)
perfect_score
floatDefaults to 1.0
Score considered perfect (default: 1.0)
use_merge
boolDefaults to False
Enable merge operations (default: False)
max_merge_invocations
intDefaults to 5
Maximum merge invocations (default: 5)
run_dir
str | None
Directory for run outputs (default: None)
track_best_outputs
boolDefaults to False
Track best outputs during optimization (default: False)
display_progress_bar
boolDefaults to False
Display progress bar (default: False)
seed
intDefaults to 42
Random seed for reproducibility (default: 42)
raise_on_exception
boolDefaults to True
Raise exceptions instead of continuing (default: True)

HierarchicalReflectiveOptimizer

1HierarchicalReflectiveOptimizer(
2 model: str = 'gpt-4o',
3 model_parameters: dict[str, typing.Any] | None = None,
4 max_parallel_batches: int = 5,
5 batch_size: int = 25,
6 convergence_threshold: float = 0.01,
7 n_threads: int = 12,
8 verbose: int = 1,
9 seed: int = 42
10)

Parameters:

model
strDefaults to gpt-4o
LiteLLM model name for the optimization algorithm (reasoning and analysis)
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls. Common params: temperature, max_tokens, max_completion_tokens, top_p.
max_parallel_batches
intDefaults to 5
Maximum number of batches to process concurrently during hierarchical root cause analysis
batch_size
intDefaults to 25
Number of test cases per batch for root cause analysis
convergence_threshold
floatDefaults to 0.01
Stop if relative improvement is below this threshold
n_threads
intDefaults to 12
Number of parallel threads for evaluation
verbose
intDefaults to 1
Controls internal logging/progress bars (0=off, 1=on)
seed
intDefaults to 42
Random seed for reproducibility

Methods

cleanup

1cleanup()

evaluate_prompt

1evaluate_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 n_threads: int,
6 verbose: int = 1,
7 dataset_item_ids: list[str] | None = None,
8 experiment_config: dict | None = None,
9 n_samples: int | None = None,
10 seed: int | None = None,
11 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None
12)

Parameters:

prompt
ChatPrompt
dataset
Dataset
metric
Callable
n_threads
int
verbose
intDefaults to 1
dataset_item_ids
list[str] | None
experiment_config
dict | None
n_samples
int | None
seed
int | None
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

get_history

1get_history()

get_optimizer_metadata

1get_optimizer_metadata()

optimize_prompt

1optimize_prompt(
2 prompt: ChatPrompt,
3 dataset: Dataset,
4 metric: Callable,
5 experiment_config: dict | None = None,
6 n_samples: int | None = None,
7 auto_continue: bool = False,
8 agent_class: type[opik_optimizer.optimizable_agent.OptimizableAgent] | None = None,
9 project_name: str = 'Optimization',
10 max_trials: int = 5,
11 max_retries: int = 2,
12 args: Any,
13 kwargs: Any
14)

Parameters:

prompt
ChatPrompt
the prompt to optimize
dataset
Dataset
Opik dataset name, or Opik dataset
metric
Callable
A metric function, this function should have two arguments: dataset_item and llm_output
experiment_config
dict | None
Optional configuration for the experiment
n_samples
int | None
auto_continue
boolDefaults to False
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
project_name
strDefaults to Optimization
Opik project name for logging traces (default: “Optimization”)
max_trials
intDefaults to 5
max_retries
intDefaults to 2
args
Any
kwargs
Any

ChatPrompt

1ChatPrompt(
2 name: str = 'chat-prompt',
3 system: str | None = None,
4 user: str | None = None,
5 messages: list[dict[str, str]] | None = None,
6 tools: list[dict[str, typing.Any]] | None = None,
7 function_map: dict[str, collections.abc.Callable] | None = None,
8 model: str = 'gpt-4o-mini',
9 invoke: collections.abc.Callable | None = None,
10 model_parameters: dict[str, typing.Any] | None = None
11)

Parameters:

name
strDefaults to chat-prompt
system
str | None
the system prompt
user
str | None
messages
list[dict[str, str]] | None
a list of dictionaries with role/content, with a content containing {input-dataset-field}
tools
list[dict[str, typing.Any]] | None
function_map
dict[str, collections.abc.Callable] | None
model
strDefaults to gpt-4o-mini
invoke
collections.abc.Callable | None
model_parameters
dict[str, typing.Any] | None

Methods

copy

1copy()

get_messages

1get_messages(
2 dataset_item: dict[str, str] | None = None
3)

Parameters:

dataset_item
dict[str, str] | None

set_messages

1set_messages(
2 messages: list
3)

Parameters:

messages
list

to_dict

1to_dict()

with_messages

1with_messages(
2 messages: list
3)

Parameters:

messages
list

OptimizationResult

1OptimizationResult(
2 optimizer: <class 'str'> = 'Optimizer',
3 prompt: list[dict[str, str]],
4 score: <class 'float'>,
5 metric_name: <class 'str'>,
6 optimization_id: str | None = None,
7 dataset_id: str | None = None,
8 initial_prompt: list[dict[str, str]] | None = None,
9 initial_score: float | None = None,
10 details: dict[str, Any] = PydanticUndefined,
11 history: list[dict[str, Any]] = [],
12 llm_calls: int | None = None,
13 tool_calls: int | None = None,
14 demonstrations: list[dict[str, Any]] | None = None,
15 mipro_prompt: str | None = None,
16 tool_prompts: dict[str, str] | None = None
17)

Parameters:

optimizer
<class 'str'>Defaults to Optimizer
prompt
list[dict[str, str]]Defaults to PydanticUndefined
score
<class 'float'>Defaults to PydanticUndefined
metric_name
<class 'str'>Defaults to PydanticUndefined
optimization_id
str | None
dataset_id
str | None
initial_prompt
list[dict[str, str]] | None
initial_score
float | None
details
dict[str, Any]Defaults to PydanticUndefined
history
list[dict[str, Any]]Defaults to []
llm_calls
int | None
tool_calls
int | None
demonstrations
list[dict[str, Any]] | None
mipro_prompt
str | None
tool_prompts
dict[str, str] | None

OptimizableAgent

1OptimizableAgent(
2 prompt: Any,
3 project_name: Any = None
4)

Parameters:

prompt
Any
project_name
Any

Methods

init_agent

1init_agent(
2 prompt: Any
3)

Parameters:

prompt
Any

init_llm

1init_llm()

invoke

1invoke(
2 messages: list,
3 seed: int | None = None
4)

Parameters:

messages
list
seed
int | None

invoke_dataset_item

1invoke_dataset_item(
2 dataset_item: dict
3)

Parameters:

dataset_item
dict

llm_invoke

1llm_invoke(
2 query: str | None = None,
3 messages: list[dict[str, str]] | None = None,
4 seed: int | None = None,
5 allow_tool_use: bool | None = False
6)

Parameters:

query
str | None
messages
list[dict[str, str]] | None
seed
int | None
allow_tool_use
bool | NoneDefaults to False