Reference
The Opik Optimizer provides a set of tools for optimizing LLM prompts. This reference guide will help you understand the available APIs and how to use them effectively.
Installation
You can install the Opik Optimizer package using pip:
To view the optimization runs in the platform, you will need to configure Opik using:
Optimization algorithms
MetaPromptOptimizer
The model to use for evaluation (e.g., “openai/gpt-4”, “azure/gpt-4”). Supports all models available through LiteLLM.
The model to use for reasoning and prompt generation. Defaults to the evaluation model if not specified.
Maximum number of optimization rounds to perform.
Number of candidate prompts to generate per optimization round.
Minimum improvement required to continue optimization.
Number of initial evaluation trials for each candidate prompt.
Maximum number of evaluation trials if adaptive trials are enabled and score is promising.
If not None, prompts scoring below best_score * adaptive_trial_threshold
after initial trials won’t get max trials.
Number of threads to use for parallel evaluation.
Optional name for the optimization project. Used for tracking and organizing results.
Additional keyword arguments passed to the model (e.g., temperature, max_tokens).
Methods
optimize_prompt
Optimizes a prompt using meta-reasoning.
Dataset to use for optimization. Can be either a dataset name string or a Dataset object.
Configuration for the evaluation metric.
Configuration for the prompt task.
Optional configuration for the experiment.
Optional number of samples to use for evaluation.
If true, the algorithm may continue optimization if goal is not met.
Returns: OptimizationResult
evaluate_prompt
Evaluates a specific prompt on a dataset.
Dataset to evaluate the prompt on.
Configuration for the evaluation metric.
Configuration for the prompt task.
The prompt to evaluate.
Whether to use the full dataset or a subset for evaluation.
Optional configuration for the experiment.
Optional number of samples to use for evaluation.
Optional ID for tracking the optimization run.
Returns: float
- The evaluation score
MiproOptimizer
The MiproOptimizer uses DSPy’s MIPRO (Modular Instruction Programming and Optimization) framework to optimize prompts. It can optimize both standard prompts and tool-using agents.
The model to use for evaluation (e.g., “openai/gpt-4”, “azure/gpt-4”). Supports all models available through LiteLLM.
Optional name for the optimization project. Used for tracking and organizing results.
Additional keyword arguments passed to the model (e.g., temperature, max_tokens). Can include num_threads
(default:
6) for parallel evaluation.
Methods
optimize_prompt
Optimizes a prompt using MIPRO (Modular Instruction Programming and Optimization).
Dataset to use for optimization. Can be either a dataset name string or a Dataset object.
Configuration for the evaluation metric.
Configuration for the prompt task. If tools are specified in the task config, the optimizer will create a tool-using agent.
Number of candidate prompts to generate and evaluate.
Optional configuration for the experiment.
Returns: OptimizationResult
evaluate_prompt
Evaluates a specific prompt on a dataset.
Dataset to evaluate the prompt on.
Configuration for the evaluation metric.
Configuration for the prompt task.
The prompt to evaluate. Can be a string prompt, DSPy module, or OptimizationResult.
Number of samples to use for evaluation.
Optional list of specific dataset item IDs to evaluate on.
Optional configuration for the experiment.
Returns: float
- The evaluation score
load_from_checkpoint
Load a previously optimized module from a checkpoint file.
Path to the checkpoint file to load.
continue_optimize_prompt
Continue the optimization process after preparing with prepare_optimize_prompt
. This method runs the actual MIPRO compilation and optimization.
Returns: OptimizationResult
FewShotBayesianOptimizer
The name of the LLM model to use (e.g., “openai/gpt-4”, “azure/gpt-4”). Supports all models available through LiteLLM.
Optional name for the optimization project. Used for tracking and organizing results.
Minimum number of few-shot examples to use in optimization.
Maximum number of few-shot examples to use in optimization.
Random seed for reproducibility.
Number of threads to use for parallel optimization.
Number of initial prompts to evaluate before starting Bayesian optimization.
Number of optimization iterations to perform.
Additional keyword arguments passed to the model (e.g., temperature, max_tokens).
Methods
optimize_prompt
Optimizes a prompt using few-shot examples and Bayesian optimization.
Dataset to use for optimization. Can be either a dataset name string or a Dataset object.
Configuration for the evaluation metric.
Configuration for the prompt task.
Number of optimization trials to run.
Optional configuration for the experiment.
Optional number of samples to use for evaluation.
Returns: OptimizationResult
evaluate_prompt
Evaluates a specific prompt on a dataset.
The prompt to evaluate, in chat format.
Dataset to evaluate the prompt on.
Configuration for the evaluation metric.
Optional configuration for the prompt task. Required if prompt is a string.
Optional list of specific dataset item IDs to evaluate on.
Optional configuration for the experiment.
Optional number of samples to use for evaluation.
Returns: float
- The evaluation score
Objects
TaskConfig
Configuration for a prompt task, specifying how to use the prompt with input data and tools.
The base instruction prompt to optimize. Can be either a string prompt or a list of chat messages in the format [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}]
.
Whether to use chat format (true) or completion format (false) for prompts.
List of field names from the dataset to use as input. These fields will be available to the prompt.
Name of the dataset field that contains the expected output for evaluation.
Optional list of tools that the agent can use. When tools are provided, the optimizer will create a tool-using agent.
MetricConfig
Configuration for a metric used in optimization. This class specifies how to evaluate prompts during optimization.
The metric instance to use for evaluation. This should be a subclass of BaseMetric that implements the evaluation logic.
A mapping of metric input names to either dataset field names (as strings) or transformation functions. The functions can be used to preprocess dataset fields before they are passed to the metric.
OptimizationResult
The optimized prompt text or chat messages.
The final score achieved by the optimized prompt.
Name of the metric used for evaluation.
Additional metadata about the optimization run.
Detailed information about the optimization process.
Best performing prompt if different from final prompt.
Best score achieved during optimization.
Metric name associated with the best score.
Details about the best performing iteration.
List of all optimization results.
History of optimization iterations.
The metric object used for evaluation.
Few-shot examples used in optimization.
Name of the optimizer used.
Tool-specific prompts if used.
The OptimizationResult
class provides rich string representations through __str__()
for plain text output and __rich__()
for terminals supporting Rich formatting. These methods display a comprehensive summary of the optimization results including scores, improvements, and the final optimized prompt.