Extending Optimizers

Extend Opik with custom optimization algorithms and contributions.

Opik Agent Optimizer is designed to be a flexible framework for prompt and agent optimization. While it provides a suite of powerful built-in algorithms, you might have unique optimization strategies or specialized needs. This guide shows how to build your own optimizer by extending the BaseOptimizer class that all built-in optimizers use.

Architecture Overview

All optimizers in the SDK extend BaseOptimizer, giving you access to the same infrastructure:

Core Concepts for a Custom Optimizer

To design a new optimization algorithm within Opik’s ecosystem, your optimizer needs to interact with several key components:

  1. Prompt (ChatPrompt): Your optimizer takes a ChatPrompt object as input. The chat prompt is a list of messages, where each message has a role, content, and optional additional fields. This includes variables that need to be replaced with actual values.

  2. Evaluation Mechanism (Metric & Dataset): Your optimizer needs a way to score candidate prompts. This is achieved by creating a metric (function that takes dataset_item and llm_output as arguments and returns a float) and an evaluation dataset.

  3. Optimization Loop: This is the heart of your custom optimizer. It involves:

    • Candidate Generation: Logic for creating new prompt variations. This could be rule-based, LLM-driven, or based on any other heuristic.
    • Candidate Evaluation: Using the metric and dataset to get a score for each candidate.
    • Selection/Progression: Logic to decide which candidates to keep, refine further, or how to adjust the generation strategy based on scores.
    • Termination Condition: Criteria for when to stop the optimization (e.g., number of rounds, score threshold, no improvement).
  4. Returning Results (OptimizationResult): Upon completion, your optimizer returns an OptimizationResult object that standardizes how results are reported, including the best prompt found, its score, history of the optimization process, and cost/usage metrics.

Creating a Custom Optimizer

Step 1: Define Your Optimizer Class

Extend BaseOptimizer and define your DEFAULT_PROMPTS - the internal prompts your algorithm uses:

1from opik_optimizer.base_optimizer import BaseOptimizer, OptimizationRound
2from opik_optimizer.optimization_result import OptimizationResult
3from opik_optimizer.api_objects.chat_prompt import ChatPrompt
4from opik import Dataset
5from typing import Any, Callable
6
7class MyCustomOptimizer(BaseOptimizer):
8 """
9 A custom optimizer that implements [your algorithm description].
10 """
11
12 # Define internal prompts used by your algorithm.
13 # Users can customize these via the prompt_overrides parameter.
14 DEFAULT_PROMPTS = {
15 "analysis_prompt": """Analyze the following prompt and identify improvement opportunities:
16
17Current prompt:
18{current_prompt}
19
20Failure cases from evaluation:
21{failures}
22
23Identify specific issues and suggest concrete improvements.""",
24
25 "generation_prompt": """Generate an improved version of this prompt:
26
27Original prompt:
28{current_prompt}
29
30Focus areas for improvement:
31{improvement_focus}
32
33Return only the improved prompt text.""",
34 }
35
36 def __init__(
37 self,
38 model: str,
39 max_iterations: int = 5,
40 candidates_per_round: int = 3,
41 improvement_threshold: float = 0.01,
42 verbose: int = 1,
43 seed: int = 42,
44 **kwargs: Any,
45 ) -> None:
46 """
47 Initialize the custom optimizer.
48
49 Args:
50 model: LiteLLM model name for the optimizer's internal LLM calls
51 max_iterations: Maximum optimization rounds
52 candidates_per_round: Number of candidate prompts to generate per round
53 improvement_threshold: Minimum score improvement to continue
54 verbose: Logging verbosity (0=off, 1=on)
55 seed: Random seed for reproducibility
56 **kwargs: Additional BaseOptimizer parameters (model_parameters, etc.)
57 """
58 super().__init__(model=model, verbose=verbose, seed=seed, **kwargs)
59 self.max_iterations = max_iterations
60 self.candidates_per_round = candidates_per_round
61 self.improvement_threshold = improvement_threshold
62
63 def get_optimizer_metadata(self) -> dict[str, Any]:
64 """
65 Expose optimizer-specific parameters for logging and tracking.
66 This metadata appears in Opik experiment configurations.
67 """
68 return {
69 "max_iterations": self.max_iterations,
70 "candidates_per_round": self.candidates_per_round,
71 "improvement_threshold": self.improvement_threshold,
72 }

Step 2: Implement the optimize_prompt() Method

This is the core method implementing your optimization logic:

1def optimize_prompt(
2 self,
3 prompt: ChatPrompt,
4 dataset: Dataset,
5 metric: Callable,
6 agent: Any = None,
7 experiment_config: dict | None = None,
8 n_samples: int | None = None,
9 auto_continue: bool = False,
10 project_name: str = "Optimization",
11 optimization_id: str | None = None,
12 validation_dataset: Dataset | None = None,
13 max_trials: int = 10,
14 **kwargs: Any,
15) -> OptimizationResult:
16 """
17 Optimize a prompt using the custom algorithm.
18
19 Args:
20 prompt: The ChatPrompt to optimize
21 dataset: Training dataset for feedback and context
22 metric: Scoring function(dataset_item, llm_output) -> float
23 agent: Optional custom agent for evaluation
24 experiment_config: Optional experiment metadata
25 n_samples: Limit dataset samples per evaluation (None = all)
26 project_name: Opik project name for tracing
27 validation_dataset: Optional separate dataset for candidate ranking
28 max_trials: Maximum evaluation trials
29 **kwargs: Algorithm-specific parameters
30
31 Returns:
32 OptimizationResult with best prompt, scores, and history
33 """
34 # 1. Initialize: Reset counters and set project context
35 self._reset_counters()
36 self.project_name = project_name
37
38 # 2. Evaluate baseline prompt to establish starting point
39 baseline_score = self.evaluate_prompt(
40 prompt=prompt,
41 dataset=dataset,
42 metric=metric,
43 n_samples=n_samples,
44 verbose=self.verbose,
45 )
46
47 # 3. Check if baseline is already good enough (skip optimization)
48 if self._should_skip_optimization(baseline_score):
49 return self._build_early_result(
50 optimizer_name=self.__class__.__name__,
51 prompt=prompt,
52 score=baseline_score,
53 metric_name=metric.__name__,
54 initial_prompt=prompt,
55 details={"reason": "baseline_score_sufficient"},
56 )
57
58 # 4. Main optimization loop
59 best_prompt = prompt
60 best_score = baseline_score
61 previous_best_score = baseline_score
62
63 for iteration in range(self.max_iterations):
64 # 4a. Generate candidate prompts based on current_best_prompt
65 candidates = self._generate_candidates(
66 current_prompt=best_prompt,
67 dataset=dataset,
68 metric=metric,
69 )
70
71 # 4b. Evaluate each candidate
72 round_best_prompt = best_prompt
73 round_best_score = best_score
74
75 for candidate in candidates:
76 # Use validation_dataset if provided, otherwise use training dataset
77 eval_dataset = validation_dataset or dataset
78 score = self.evaluate_prompt(
79 prompt=candidate,
80 dataset=eval_dataset,
81 metric=metric,
82 n_samples=n_samples,
83 verbose=0, # Reduce noise during candidate evaluation
84 )
85
86 # 4c. Select the best candidate from this round
87 if score > round_best_score:
88 round_best_score = score
89 round_best_prompt = candidate
90
91 # Update global best if this round improved
92 if round_best_score > best_score:
93 best_score = round_best_score
94 best_prompt = round_best_prompt
95
96 # 4d. Record optimization history
97 self._add_to_history(OptimizationRound(
98 round_number=iteration,
99 current_prompt=best_prompt,
100 current_score=best_score,
101 generated_prompts=candidates,
102 best_prompt=best_prompt,
103 best_score=best_score,
104 improvement=best_score - baseline_score,
105 ))
106
107 # 4e. Check termination conditions
108 improvement = best_score - previous_best_score
109 if improvement < self.improvement_threshold:
110 if self.verbose:
111 print(f"Converged at iteration {iteration}")
112 break
113
114 previous_best_score = best_score
115
116 # 5. Prepare and return OptimizationResult
117 return OptimizationResult(
118 optimizer=self.__class__.__name__,
119 prompt=best_prompt,
120 score=best_score,
121 metric_name=metric.__name__,
122 initial_prompt=prompt,
123 initial_score=baseline_score,
124 details={
125 "iterations_completed": iteration + 1,
126 "total_candidates_evaluated": (iteration + 1) * self.candidates_per_round,
127 },
128 history=self.get_history(),
129 llm_calls=self.llm_call_counter,
130 llm_calls_tools=self.llm_calls_tools_counter,
131 llm_cost_total=self.llm_cost_total,
132 llm_token_usage_total=self.llm_token_usage_total,
133 )

Step 3: Implement Candidate Generation

Your custom logic to create new prompt variations. Use get_prompt() to access internal prompts (which respects user’s prompt_overrides):

1from opik_optimizer._llm_calls import call_model
2
3def _generate_candidates(
4 self,
5 current_prompt: ChatPrompt,
6 dataset: Dataset,
7 metric: Callable,
8) -> list[ChatPrompt]:
9 """
10 Generate candidate prompts using LLM-based improvement.
11
12 Args:
13 current_prompt: The prompt to improve
14 dataset: Dataset for context (can analyze failures)
15 metric: Metric for understanding what "good" means
16
17 Returns:
18 List of candidate ChatPrompt objects
19 """
20 candidates = []
21
22 for i in range(self.candidates_per_round):
23 # Get the generation prompt template (respects prompt_overrides)
24 generation_request = self.get_prompt(
25 "generation_prompt",
26 current_prompt=current_prompt.get_messages(),
27 improvement_focus=f"variation {i+1}: explore different approaches",
28 )
29
30 # Call LLM to generate an improved prompt
31 response = call_model(
32 messages=[{"role": "user", "content": generation_request}],
33 model=self.model,
34 seed=self.seed + i, # Vary seed for diversity
35 model_parameters=self.model_parameters,
36 project_name=self.project_name,
37 )
38
39 # Parse the response and create a new ChatPrompt
40 new_prompt = self._parse_prompt_from_response(response, current_prompt)
41 if new_prompt is not None:
42 candidates.append(new_prompt)
43
44 return candidates
45
46def _parse_prompt_from_response(
47 self,
48 response: str,
49 template_prompt: ChatPrompt,
50) -> ChatPrompt | None:
51 """
52 Parse LLM response into a new ChatPrompt.
53 """
54 try:
55 new_prompt = template_prompt.model_copy(deep=True)
56 # Update the system message with the improved prompt
57 for msg in new_prompt.messages:
58 if msg.get("role") == "system":
59 msg["content"] = response.strip()
60 break
61 return new_prompt
62 except Exception:
63 return None

What BaseOptimizer Provides

The BaseOptimizer class provides robust mechanisms for prompt evaluation that all existing optimizers leverage. Your custom optimizer reuses these internal evaluation utilities to ensure consistency with the Opik ecosystem.

ComponentDescription
evaluate_prompt()Evaluates a prompt against dataset using metric. Handles threading, sampling, and result aggregation.
get_prompt(key, **fmt)Gets internal prompt template with optional formatting. Respects prompt_overrides.
list_prompts()Lists all available prompt keys for this optimizer.
_reset_counters()Resets LLM call/cost counters. Call at start of optimize_prompt().
_add_to_history()Tracks optimization rounds for result reporting.
_should_skip_optimization()Checks if baseline score exceeds perfect_score threshold.
_build_early_result()Creates OptimizationResult when skipping optimization.
llm_call_counterTracks number of LLM calls made.
llm_cost_totalTracks total API cost (when available from provider).
llm_token_usage_totalTracks token usage across all calls.

Using Structured Outputs

For complex generation, use Pydantic models for structured LLM responses:

1from opik_optimizer._llm_calls import call_model
2from pydantic import BaseModel
3
4class PromptAnalysis(BaseModel):
5 issues: list[str]
6 suggestions: list[str]
7 priority: str
8
9# Returns a parsed Pydantic object, not raw text
10analysis = call_model(
11 messages=[{"role": "user", "content": "Analyze this prompt: ..."}],
12 model=self.model,
13 response_model=PromptAnalysis,
14 project_name=self.project_name,
15)
16
17print(analysis.issues) # ['Issue 1', 'Issue 2']
18print(analysis.suggestions) # ['Suggestion 1', ...]

How to Contribute

Opik is continuously evolving, and community contributions are valuable!

  • Feature Requests & Ideas: If you have ideas for new optimization algorithms, features, or improvements to existing ones, please share them through our community channels or by raising an issue on our GitHub repository.
  • Bug Reports: If you encounter issues or unexpected behavior, detailed bug reports are greatly appreciated.
  • Use Cases & Feedback: Sharing your use cases and how Opik Agent Optimizer is (or isn’t) meeting your needs helps us prioritize development.
  • Code Contributions: Pull requests for new optimizers are welcome! See the contribution guide for detailed instructions.

Key Takeaways

  • Extend BaseOptimizer to create custom optimization algorithms with full access to Opik’s infrastructure
  • Define DEFAULT_PROMPTS for your algorithm’s internal prompts - users can customize these via prompt_overrides
  • Implement optimize_prompt() with your optimization logic, using the inherited evaluate_prompt() to score candidates
  • Return standardized OptimizationResult objects for consistent reporting and dashboard integration
  • Use _llm_calls.call_model() for LLM interactions with automatic cost/usage tracking

We encourage you to explore the existing optimizer algorithms to see different approaches to these challenges.