Hierarchical Reflective Optimizer

The HierarchicalReflectiveOptimizer uses hierarchical root cause analysis to identify and address specific failure modes in your prompts. It analyzes evaluation results, identifies patterns in failures, and generates targeted improvements to address each failure mode systematically.

HierarchicalReflectiveOptimizer is ideal when you have a complex prompt that you want to refine based on understanding why it’s failing. Unlike optimizers that generate many random variations, this optimizer systematically analyzes failures, identifies root causes, and makes surgical improvements to address each specific issue.

How It Works

The Hierarchical Reflective Optimizer has been developed by the Opik team to improve prompts that might have already gone through a few rounds of manual prompt engineering. It focuses on identifying why a prompt is failing and then updating the prompts to address the issues.

As datasets can be large, we split the analysis into batches and analyze them in parallel. We then synthesize the findings across all batches to identify the core issues with the prompt.

The optimizer is open-source, you can check out the root cause analysis code and prompts in the Opik repository.

Quickstart

You can use the HierarchicalReflectiveOptimizer to optimize a prompt:

1 from opik_optimizer import HierarchicalReflectiveOptimizer, ChatPrompt, datasets
2 from opik.evaluation.metrics.score_result import ScoreResult
3 
4 # 1. Define your evaluation dataset
5 dataset = datasets.hotpot(count=300)  # or use your own dataset
6 
7 # 2. Configure the evaluation metric (MUST return reasons!)
8 def answer_quality_metric(dataset_item, llm_output):
9     reference = dataset_item.get("answer", "")
10 
11     # Your scoring logic
12     is_correct = reference.lower() in llm_output.lower()
13     score = 1.0 if is_correct else 0.0
14 
15     # IMPORTANT: Provide detailed reasoning
16     if is_correct:
17         reason = f"Output contains the correct answer: '{reference}'"
18     else:
19         reason = f"Output does not contain expected answer '{reference}'. Output was too vague or incorrect."
20 
21     return ScoreResult(
22         name="answer_quality",
23         value=score,
24         reason=reason  # Critical for root cause analysis!
25     )
26 
27 # 3. Define your initial prompt
28 initial_prompt = ChatPrompt(
29     project_name="reflective_optimization",
30     messages=[
31         {
32             "role": "system",
33             "content": "You are a helpful assistant that answers questions accurately."
34         },
35         {
36             "role": "user",
37             "content": "Question: {question}\n\nProvide a concise answer."
38         }
39     ]
40 )
41 
42 # 4. Initialize the HierarchicalReflectiveOptimizer
43 optimizer = HierarchicalReflectiveOptimizer(
44     model="gpt-4o",
45     n_threads=8,
46     max_parallel_batches=5,
47     seed=42,
48     model_parameters={"temperature": 0.7}
49 )
50 
51 # 5. Run the optimization
52 optimization_result = optimizer.optimize_prompt(
53     prompt=initial_prompt,
54     dataset=dataset,
55     metric=answer_quality_metric,
56     n_samples=100,
57     max_trials=5,
58     max_retries=2
59 )
60 
61 # 6. View the results
62 optimization_result.display()

Configuration Options

Optimizer parameters

The optimizer has the following parameters:

model

str

LiteLLM model name for optimizer’s internal reasoning/generation calls

n_threads

int

verbose

int

Controls internal logging/progress bars (0=off, 1=on).

seed

int

Random seed for reproducibility (default: 42)

max_parallel_batches

int

batch_size

int

convergence_threshold

float

model_parameters

dict[str, typing.Any] | None

Optional dict of LiteLLM parameters for optimizer’s internal LLM calls.

`optimize_prompt` parameters

The optimize_prompt method has the following parameters:

prompt

ChatPrompt

dataset

Dataset

Opik dataset name, or Opik dataset

metric

Callable

A metric function, this function should have two arguments:

experiment_config

dict | None

n_samples

int | None

auto_continue

bool

agent_class

type[opik_optimizer.optimizable_agent.OptimizableAgent] | None

project_name

str

max_trials

int

max_retries

int

kwargs

Any

Model Support

There are two models to consider when using the HierarchicalReflectiveOptimizer:

HierarchicalReflectiveOptimizer.model: The model used for the root cause analysis and failure mode synthesis.
ChatPrompt.model: The model used to evaluate the prompt.

The model parameter accepts any LiteLLM-supported model string (e.g., "gpt-4o", "azure/gpt-4", "anthropic/claude-3-opus", "gemini/gemini-1.5-pro"). You can also pass in extra model parameters using the model_parameters parameter:

1 optimizer = HierarchicalReflectiveOptimizer(
2     model="anthropic/claude-3-opus-20240229",
3     model_parameters={
4         "temperature": 0.7,
5         "max_tokens": 4096
6     }
7 )

Next Steps

Explore specific Optimizers for algorithm details.
Refer to the FAQ for common questions and troubleshooting.
Refer to the API Reference for detailed configuration options.