In Opik 2.0, datasets and experiments are project-scoped. Make sure to specify a project_name when creating datasets and running experiments so they are associated with the correct project.
The Opik Agent Optimizer SDK provides a comprehensive set of tools for optimizing LLM prompts and agents. This reference guide documents the standardized API that all optimizers follow, ensuring consistency and interoperability across different optimization algorithms.
Key Features
- Standardized API: All optimizers follow the same interface for
optimize_prompt() methods
- Multiple Algorithms: Support for various optimization strategies including evolutionary, few-shot, meta-prompt, and GEPA
- MCP Support: Built-in support for Model Context Protocol tool calling and optimization
- Consistent Results: All optimizers return standardized
OptimizationResult objects
- Counter Tracking: Built-in LLM and tool call counters for monitoring usage
- Backward Compatibility: All original parameters preserved through kwargs extraction
- Deprecation Warnings: Clear warnings for deprecated parameters with migration guidance
Core Classes
The SDK provides several optimizer classes that all inherit from BaseOptimizer and implement the same standardized interface:
- ParameterOptimizer: Optimizes LLM call parameters (temperature, top_p, etc.) using Bayesian optimization
- FewShotBayesianOptimizer: Uses few-shot learning with Bayesian optimization
- MetaPromptOptimizer: Employs meta-prompting techniques for optimization
- EvolutionaryOptimizer: Uses genetic algorithms for prompt evolution
- GepaOptimizer: Leverages GEPA (Genetic-Pareto) optimization approach
- HRPO (Hierarchical Reflective Prompt Optimizer): Uses hierarchical root cause analysis for targeted prompt refinement
Standardized Method Signatures
All optimizers implement these core methods with identical signatures:
optimize_prompt()
Deprecation Warnings
The following parameters are deprecated and will be removed in future versions:
Constructor Parameters
num_threads in optimizer constructors: Use n_threads instead
Example Migration
FewShotBayesianOptimizer
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_prompt
Parameters:
The prompt to optimize (single ChatPrompt or dict of prompts)
Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.
A metric function with signature (dataset_item, llm_output) -> float
Optional agent for prompt execution (defaults to LiteLLMAgent)
Optional configuration for the experiment
Number of samples to use for evaluation
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Whether to continue optimization automatically
Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)
Optional ID to use when creating the Opik optimization run
Optional validation dataset for ranking candidates
Maximum number of optimization trials
Whether tools may be executed during evaluation (default True)
Which prompt roles to allow for optimization
Optional tool optimization selector. Only supported by optimizers that explicitly document tool optimization support.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
run_optimization
Parameters:
The optimization context with prompts, dataset, metric, etc.
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
GepaOptimizer
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_prompt
Parameters:
The prompt to optimize (single ChatPrompt or dict of prompts)
Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.
A metric function with signature (dataset_item, llm_output) -> float
Optional agent for prompt execution (defaults to LiteLLMAgent)
Optional configuration for the experiment
Number of samples to use for evaluation
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Whether to continue optimization automatically
Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)
Optional ID to use when creating the Opik optimization run
Optional validation dataset for ranking candidates
Maximum number of optimization trials
Whether tools may be executed during evaluation (default True)
Which prompt roles to allow for optimization
Optional tool optimization selector. Only supported by optimizers that explicitly document tool optimization support.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
run_optimization
Parameters:
The optimization context with prompts, dataset, metric, etc.
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_prompt
Parameters:
The prompt to optimize (single ChatPrompt or dict of prompts)
Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.
A metric function with signature (dataset_item, llm_output) -> float
Optional agent for prompt execution (defaults to LiteLLMAgent)
Optional configuration for the experiment
Number of samples to use for evaluation
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Whether to continue optimization automatically
Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)
Optional ID to use when creating the Opik optimization run
Optional validation dataset for ranking candidates
Maximum number of optimization trials
Whether tools may be executed during evaluation (default True)
Which prompt roles to allow for optimization
Optional tool optimization selector. Only supported by optimizers that explicitly document tool optimization support.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
run_optimization
Parameters:
The optimization context with prompts, dataset, metric, etc.
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
EvolutionaryOptimizer
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_prompt
Parameters:
The prompt to optimize (single ChatPrompt or dict of prompts)
Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.
A metric function with signature (dataset_item, llm_output) -> float
Optional agent for prompt execution (defaults to LiteLLMAgent)
Optional configuration for the experiment
Number of samples to use for evaluation
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Whether to continue optimization automatically
Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)
Optional ID to use when creating the Opik optimization run
Optional validation dataset for ranking candidates
Maximum number of optimization trials
Whether tools may be executed during evaluation (default True)
Which prompt roles to allow for optimization
Optional tool optimization selector. Only supported by optimizers that explicitly document tool optimization support.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
run_optimization
Parameters:
The optimization context with prompts, dataset, metric, etc.
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
HierarchicalReflectiveOptimizer
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_prompt
Parameters:
The prompt to optimize (single ChatPrompt or dict of prompts)
Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.
A metric function with signature (dataset_item, llm_output) -> float
Optional agent for prompt execution (defaults to LiteLLMAgent)
Optional configuration for the experiment
Number of samples to use for evaluation
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Whether to continue optimization automatically
Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)
Optional ID to use when creating the Opik optimization run
Optional validation dataset for ranking candidates
Maximum number of optimization trials
Whether tools may be executed during evaluation (default True)
Which prompt roles to allow for optimization
Optional tool optimization selector. Only supported by optimizers that explicitly document tool optimization support.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
run_optimization
Parameters:
The optimization context with prompts, dataset, metric, etc.
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
ParameterOptimizer
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_parameter
Parameters:
The prompt or dict of prompts to evaluate with tuned parameters. When a dict is provided, parameters are optimized independently for each prompt.
Dataset providing evaluation examples
Objective function to maximize
Definition of the search space for tunable parameters. For multi-prompt, params without a prefix are expanded per prompt. Params already prefixed (e.g., ‘analyze.temperature’) are kept as-is.
Optional validation dataset. Note: Due to the internal implementation of ParameterOptimizer, this parameter is currently not fully utilized and we recommend not using it for this optimizer.
Optional experiment metadata
Total number of trials (if None, uses default_n_trials)
Number of dataset samples to evaluate per trial (None for all)
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Optional custom agent instance to execute evaluations
Opik project name for logging traces (default: “Optimization”)
Optuna sampler to use (default: TPESampler with seed)
List of callback functions for Optuna study
Maximum time in seconds for optimization
Number of trials for local search (overrides local_search_ratio)
Scale factor for local search narrowing (0.0-1.0)
Optional ID to use when creating the Opik optimization run; when provided it must be a valid UUIDv7 string.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
ParameterSearchSpace
Parameters:
ParameterSpec
Parameters:
ParameterType
Parameters:
BaseOptimizer
Parameters:
Methods
begin_round
Parameters:
cleanup
evaluate
Parameters:
Optimization context for this run.
Dict of named prompts to evaluate (e.g., {“main”: ChatPrompt(…)}). Single-prompt optimizations use a dict with one entry.
Optional experiment configuration.
Optional sampling tag for deterministic subsampling per candidate.
evaluate_prompt
Parameters:
evaluate_with_result
Parameters:
finish_candidate
Parameters:
finish_round
Parameters:
get_config
Parameters:
get_default_prompt
Parameters:
The prompt key to retrieve
get_history_entries
get_history_rounds
Parameters:
get_prompt
Parameters:
The prompt key to retrieve
list_prompts
on_trial
Parameters:
optimize_mcp
Parameters:
optimize_prompt
Parameters:
The prompt to optimize (single ChatPrompt or dict of prompts)
Opik dataset (training set - used for feedback/context) TODO/FIXME: This parameter will be deprecated in favor of dataset_training. For now, it serves as the training dataset parameter.
A metric function with signature (dataset_item, llm_output) -> float
Optional agent for prompt execution (defaults to LiteLLMAgent)
Optional configuration for the experiment
Number of samples to use for evaluation
Optional number of samples for inner-loop minibatches
Sampling strategy name (default “random_sorted”)
Whether to continue optimization automatically
Opik project name for logging traces (defaults to OPIK_PROJECT_NAME env or “Optimization”)
Optional ID to use when creating the Opik optimization run
Optional validation dataset for ranking candidates
Maximum number of optimization trials
Whether tools may be executed during evaluation (default True)
Which prompt roles to allow for optimization
Optional tool optimization selector. Only supported by optimizers that explicitly document tool optimization support.
post_baseline
Parameters:
post_optimize
Parameters:
post_round
Parameters:
post_trial
Parameters:
pre_baseline
Parameters:
pre_optimize
Parameters:
pre_round
Parameters:
pre_trial
Parameters:
record_candidate_entry
Parameters:
run_optimization
Parameters:
The optimization context with prompts, dataset, metric, etc.
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_candidate
Parameters:
with_dataset_split
Parameters:
ChatPrompt
Parameters:
a list of dictionaries with role/content, with a content containing {input-dataset-field}
Methods
copy
get_messages
Parameters:
replace_in_messages
Parameters:
set_messages
Parameters:
to_dict
AlgorithmResult
Parameters:
OptimizationResult
Parameters:
OptimizationContext
Parameters:
OptimizationHistoryState
Parameters:
Methods
clear
end_round
Parameters:
finalize_stop
Parameters:
get_entries
get_rounds
record_trial
Parameters:
set_context
Parameters:
set_default_dataset_split
Parameters:
set_pareto_front
Parameters:
Parameters:
start_round
Parameters:
with_dataset_split
Parameters:
OptimizationRound
Parameters:
Methods
to_dict
OptimizationTrial
Parameters:
Methods
to_dict
OptimizableAgent
Parameters:
Methods
init_agent
Parameters:
init_llm
invoke
Parameters:
List of message dictionaries
Optional seed for reproducibility
invoke_agent
Parameters:
invoke_agent_candidates
Parameters:
Mapping of prompt name to ChatPrompt.
Dataset row used to render the prompt messages.
Whether tool execution is allowed in this invocation.
Optional seed for reproducibility.
invoke_dataset_item
Parameters:
invoke_prompt
Parameters:
llm_invoke
Parameters:
MultiMetricObjective
Parameters:
PromptLibrary
Parameters:
Dictionary of default prompt templates
Optional dict or callable to customize prompts
Methods
get
Parameters:
The prompt key to retrieve
get_default
Parameters:
The prompt key to retrieve
keys
set
Parameters:
update
Parameters:
Dictionary of key-value pairs to update