Reference

The Opik Optimizer provides a set of tools for optimizing LLM prompts. This reference guide will help you understand the available APIs and how to use them effectively.

Installation

You can install the Opik Optimizer package using pip:

$pip install opik-optimizer opik

To view the optimization runs in the platform, you will need to configure Opik using:

$opik configure

Optimization algorithms

MetaPromptOptimizer

wordWrap {pytest_codeblocks_skip=true}
1class opik_optimizer.MetaPromptOptimizer(
2 model: str,
3 reasoning_model: str = None,
4 max_rounds: int = 3,
5 num_prompts_per_round: int = 4,
6 improvement_threshold: float = 0.05,
7 initial_trials_per_candidate: int = 3,
8 max_trials_per_candidate: int = 6,
9 adaptive_trial_threshold: Optional[float] = 0.8,
10 num_threads: int = 12,
11 project_name: Optional[str] = None,
12 **model_kwargs
13)
model
stringRequired

The model to use for evaluation (e.g., “openai/gpt-4”, “azure/gpt-4”). Supports all models available through LiteLLM.

reasoning_model
string

The model to use for reasoning and prompt generation. Defaults to the evaluation model if not specified.

max_rounds
numberDefaults to 3

Maximum number of optimization rounds to perform.

num_prompts_per_round
numberDefaults to 4

Number of candidate prompts to generate per optimization round.

improvement_threshold
numberDefaults to 0.05

Minimum improvement required to continue optimization.

initial_trials_per_candidate
numberDefaults to 3

Number of initial evaluation trials for each candidate prompt.

max_trials_per_candidate
numberDefaults to 6

Maximum number of evaluation trials if adaptive trials are enabled and score is promising.

adaptive_trial_threshold
numberDefaults to 0.8

If not None, prompts scoring below best_score * adaptive_trial_threshold after initial trials won’t get max trials.

num_threads
numberDefaults to 12

Number of threads to use for parallel evaluation.

project_name
string

Optional name for the optimization project. Used for tracking and organizing results.

model_kwargs
object

Additional keyword arguments passed to the model (e.g., temperature, max_tokens).

Methods

optimize_prompt

Optimizes a prompt using meta-reasoning.

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.MetaPromptOptimizer.optimize_prompt(
2 self,
3 dataset: Union[str, Dataset],
4 metric_config: MetricConfig,
5 task_config: TaskConfig,
6 experiment_config: Optional[Dict] = None,
7 n_samples: int = None,
8 auto_continue: bool = False,
9 **kwargs
10) -> OptimizationResult
dataset
Union[str, Dataset]Required

Dataset to use for optimization. Can be either a dataset name string or a Dataset object.

metric_config
MetricConfigRequired

Configuration for the evaluation metric.

task_config
TaskConfigRequired

Configuration for the prompt task.

experiment_config
Dict

Optional configuration for the experiment.

n_samples
number

Optional number of samples to use for evaluation.

auto_continue
booleanDefaults to false

If true, the algorithm may continue optimization if goal is not met.

Returns: OptimizationResult

evaluate_prompt

Evaluates a specific prompt on a dataset.

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.MetaPromptOptimizer.evaluate_prompt(
2 self,
3 dataset: Dataset,
4 metric_config: MetricConfig,
5 task_config: TaskConfig,
6 prompt: str,
7 use_full_dataset: bool = False,
8 experiment_config: Optional[Dict] = None,
9 n_samples: Optional[int] = None,
10 optimization_id: Optional[str] = None
11) -> float
dataset
DatasetRequired

Dataset to evaluate the prompt on.

metric_config
MetricConfigRequired

Configuration for the evaluation metric.

task_config
TaskConfigRequired

Configuration for the prompt task.

prompt
stringRequired

The prompt to evaluate.

use_full_dataset
booleanDefaults to false

Whether to use the full dataset or a subset for evaluation.

experiment_config
Dict

Optional configuration for the experiment.

n_samples
number

Optional number of samples to use for evaluation.

optimization_id
string

Optional ID for tracking the optimization run.

Returns: float - The evaluation score

MiproOptimizer

wordWrap {pytest_codeblocks_skip=true}
1class opik_optimizer.MiproOptimizer(
2 model: str,
3 project_name: Optional[str] = None,
4 **model_kwargs
5)

The MiproOptimizer uses DSPy’s MIPRO (Modular Instruction Programming and Optimization) framework to optimize prompts. It can optimize both standard prompts and tool-using agents.

model
stringRequired

The model to use for evaluation (e.g., “openai/gpt-4”, “azure/gpt-4”). Supports all models available through LiteLLM.

project_name
string

Optional name for the optimization project. Used for tracking and organizing results.

model_kwargs
object

Additional keyword arguments passed to the model (e.g., temperature, max_tokens). Can include num_threads (default: 6) for parallel evaluation.

Methods

optimize_prompt

Optimizes a prompt using MIPRO (Modular Instruction Programming and Optimization).

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.MiproOptimizer.optimize_prompt(
2 self,
3 dataset: Union[str, Dataset],
4 metric_config: MetricConfig,
5 task_config: TaskConfig,
6 num_candidates: int = 10,
7 experiment_config: Optional[Dict] = None,
8 **kwargs
9) -> OptimizationResult
dataset
Union[str, Dataset]Required

Dataset to use for optimization. Can be either a dataset name string or a Dataset object.

metric_config
MetricConfigRequired

Configuration for the evaluation metric.

task_config
TaskConfigRequired

Configuration for the prompt task. If tools are specified in the task config, the optimizer will create a tool-using agent.

num_candidates
numberDefaults to 10

Number of candidate prompts to generate and evaluate.

experiment_config
Dict

Optional configuration for the experiment.

Returns: OptimizationResult

evaluate_prompt

Evaluates a specific prompt on a dataset.

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.MiproOptimizer.evaluate_prompt(
2 self,
3 dataset: Dataset,
4 metric_config: MetricConfig,
5 task_config: TaskConfig,
6 prompt: Union[str, dspy.Module, OptimizationResult] = None,
7 n_samples: int = 10,
8 dataset_item_ids: Optional[List[str]] = None,
9 experiment_config: Optional[Dict] = None,
10 **kwargs
11) -> float
dataset
DatasetRequired

Dataset to evaluate the prompt on.

metric_config
MetricConfigRequired

Configuration for the evaluation metric.

task_config
TaskConfigRequired

Configuration for the prompt task.

prompt
Union[str, dspy.Module, OptimizationResult]

The prompt to evaluate. Can be a string prompt, DSPy module, or OptimizationResult.

n_samples
numberDefaults to 10

Number of samples to use for evaluation.

dataset_item_ids
List[str]

Optional list of specific dataset item IDs to evaluate on.

experiment_config
Dict

Optional configuration for the experiment.

Returns: float - The evaluation score

load_from_checkpoint

Load a previously optimized module from a checkpoint file.

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.MiproOptimizer.load_from_checkpoint(
2 self,
3 filename: str
4)
filename
stringRequired

Path to the checkpoint file to load.

continue_optimize_prompt

Continue the optimization process after preparing with prepare_optimize_prompt. This method runs the actual MIPRO compilation and optimization.

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.MiproOptimizer.continue_optimize_prompt(
2 self
3) -> OptimizationResult

Returns: OptimizationResult

FewShotBayesianOptimizer

wordWrap {pytest_codeblocks_skip=true}
1class opik_optimizer.FewShotBayesianOptimizer(
2 model: str,
3 project_name: Optional[str] = None,
4 min_examples: int = 2,
5 max_examples: int = 8,
6 seed: int = 42,
7 n_threads: int = 8,
8 n_initial_prompts: int = 5,
9 n_iterations: int = 10
10)
model
stringRequired

The name of the LLM model to use (e.g., “openai/gpt-4”, “azure/gpt-4”). Supports all models available through LiteLLM.

project_name
string

Optional name for the optimization project. Used for tracking and organizing results.

min_examples
numberDefaults to 2

Minimum number of few-shot examples to use in optimization.

max_examples
numberDefaults to 8

Maximum number of few-shot examples to use in optimization.

seed
numberDefaults to 42

Random seed for reproducibility.

n_threads
numberDefaults to 8

Number of threads to use for parallel optimization.

n_initial_prompts
numberDefaults to 5

Number of initial prompts to evaluate before starting Bayesian optimization.

n_iterations
numberDefaults to 10

Number of optimization iterations to perform.

model_kwargs
object

Additional keyword arguments passed to the model (e.g., temperature, max_tokens).

Methods

optimize_prompt

Optimizes a prompt using few-shot examples and Bayesian optimization.

wordWrap {pytest_codeblocks_skip=true}
1def opik_optimizer.FewShotBayesianOptimizer.optimize_prompt(
2 self,
3 dataset: Union[str, Dataset],
4 metric_config: MetricConfig,
5 task_config: TaskConfig,
6 n_trials: int = 10,
7 experiment_config: Optional[Dict] = None,
8 n_samples: Optional[int] = None
9) -> OptimizationResult
dataset
Union[str, Dataset]Required

Dataset to use for optimization. Can be either a dataset name string or a Dataset object.

metric_config
MetricConfigRequired

Configuration for the evaluation metric.

task_config
TaskConfigRequired

Configuration for the prompt task.

n_trials
numberDefaults to 10

Number of optimization trials to run.

experiment_config
Dict

Optional configuration for the experiment.

n_samples
number

Optional number of samples to use for evaluation.

Returns: OptimizationResult

evaluate_prompt

Evaluates a specific prompt on a dataset.

prompt
List[Dict[Literal['role', 'content'], str]]Required

The prompt to evaluate, in chat format.

dataset
DatasetRequired

Dataset to evaluate the prompt on.

metric_config
MetricConfigRequired

Configuration for the evaluation metric.

task_config
TaskConfig

Optional configuration for the prompt task. Required if prompt is a string.

dataset_item_ids
List[str]

Optional list of specific dataset item IDs to evaluate on.

experiment_config
Dict

Optional configuration for the experiment.

n_samples
number

Optional number of samples to use for evaluation.

Returns: float - The evaluation score

Objects

TaskConfig

Configuration for a prompt task, specifying how to use the prompt with input data and tools.

wordWrap {pytest_codeblocks_skip=true}
1class opik_optimizer.TaskConfig(
2 instruction_prompt: Union[str, List[Dict[Literal["role", "content"], str]]],
3 use_chat_prompt: bool = False,
4 input_dataset_fields: List[str],
5 output_dataset_field: str,
6 tools: List[Any] = []
7)
instruction_prompt
Union[str, List[Dict[Literal['role', 'content'], str]]]Required

The base instruction prompt to optimize. Can be either a string prompt or a list of chat messages in the format [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}].

use_chat_prompt
boolDefaults to false

Whether to use chat format (true) or completion format (false) for prompts.

input_dataset_fields
List[str]Required

List of field names from the dataset to use as input. These fields will be available to the prompt.

output_dataset_field
stringRequired

Name of the dataset field that contains the expected output for evaluation.

tools
List[Any]Defaults to []

Optional list of tools that the agent can use. When tools are provided, the optimizer will create a tool-using agent.

MetricConfig

Configuration for a metric used in optimization. This class specifies how to evaluate prompts during optimization.

wordWrap {pytest_codeblocks_skip=true}
1class opik_optimizer.MetricConfig(
2 metric: BaseMetric,
3 inputs: Dict[str, Union[str, Callable[[Any], Any]]]
4)
metric
BaseMetricRequired

The metric instance to use for evaluation. This should be a subclass of BaseMetric that implements the evaluation logic.

inputs
Dict[str, Union[str, Callable[[Any], Any]]]Required

A mapping of metric input names to either dataset field names (as strings) or transformation functions. The functions can be used to preprocess dataset fields before they are passed to the metric.

OptimizationResult

wordWrap {pytest_codeblocks_skip=true}
1class opik_optimizer.OptimizationResult(
2 prompt: Union[str, List[Dict[Literal["role", "content"], str]]],
3 score: float,
4 metric_name: str,
5 metadata: Dict[str, Any] = {},
6 details: Dict[str, Any] = {},
7 best_prompt: Optional[str] = None,
8 best_score: Optional[float] = None,
9 best_metric_name: Optional[str] = None,
10 best_details: Optional[Dict[str, Any]] = None,
11 all_results: Optional[List[Dict[str, Any]]] = None,
12 history: List[Dict[str, Any]] = [],
13 metric: Optional[BaseMetric] = None,
14 demonstrations: Optional[List[Dict[str, Any]]] = None,
15 optimizer: str = "Optimizer",
16 tool_prompts: Optional[Dict[str, str]] = None
17)
prompt
Union[str, List[Dict[Literal['role', 'content'], str]]]Required

The optimized prompt text or chat messages.

score
floatRequired

The final score achieved by the optimized prompt.

metric_name
stringRequired

Name of the metric used for evaluation.

metadata
Dict[str, Any]Defaults to {}

Additional metadata about the optimization run.

details
Dict[str, Any]Defaults to {}

Detailed information about the optimization process.

best_prompt
Optional[str]

Best performing prompt if different from final prompt.

best_score
Optional[float]

Best score achieved during optimization.

best_metric_name
Optional[str]

Metric name associated with the best score.

best_details
Optional[Dict[str, Any]]

Details about the best performing iteration.

all_results
Optional[List[Dict[str, Any]]]

List of all optimization results.

history
List[Dict[str, Any]]Defaults to []

History of optimization iterations.

metric
Optional[BaseMetric]

The metric object used for evaluation.

demonstrations
Optional[List[Dict[str, Any]]]

Few-shot examples used in optimization.

optimizer
stringDefaults to Optimizer

Name of the optimizer used.

tool_prompts
Optional[Dict[str, str]]

Tool-specific prompts if used.

The OptimizationResult class provides rich string representations through __str__() for plain text output and __rich__() for terminals supporting Rich formatting. These methods display a comprehensive summary of the optimization results including scores, improvements, and the final optimized prompt.