Hierarchical Reflective Optimizer

Hierarchical root cause analysis for targeted prompt improvement

The HierarchicalReflectiveOptimizer uses hierarchical root cause analysis to identify and address specific failure modes in your prompts. It analyzes evaluation results, identifies patterns in failures, and generates targeted improvements to address each failure mode systematically.

HierarchicalReflectiveOptimizer is ideal when you have a complex prompt that you want to refine based on understanding why it’s failing. Unlike optimizers that generate many random variations, this optimizer systematically analyzes failures, identifies root causes, and makes surgical improvements to address each specific issue.

How It Works

The Hierarchical Reflective Optimizer has been developed by the Opik team to improve prompts that might have already gone through a few rounds of manual prompt engineering. It focuses on identifying why a prompt is failing and then updating the prompts to address the issues.

As datasets can be large, we split the analysis into batches and analyze them in parallel. We then synthesize the findings across all batches to identify the core issues with the prompt.

Hierarchical Reflective Optimizer

The optimizer is open-source, you can check out the root cause analysis code and prompts in the Opik repository.

Quickstart

You can use the HierarchicalReflectiveOptimizer to optimize a prompt:

1from opik_optimizer import HierarchicalReflectiveOptimizer, ChatPrompt, datasets
2from opik.evaluation.metrics.score_result import ScoreResult
3
4# 1. Define your evaluation dataset
5dataset = datasets.hotpot(count=300) # or use your own dataset
6
7# 2. Configure the evaluation metric (MUST return reasons!)
8def answer_quality_metric(dataset_item, llm_output):
9 reference = dataset_item.get("answer", "")
10
11 # Your scoring logic
12 is_correct = reference.lower() in llm_output.lower()
13 score = 1.0 if is_correct else 0.0
14
15 # IMPORTANT: Provide detailed reasoning
16 if is_correct:
17 reason = f"Output contains the correct answer: '{reference}'"
18 else:
19 reason = f"Output does not contain expected answer '{reference}'. Output was too vague or incorrect."
20
21 return ScoreResult(
22 name="answer_quality",
23 value=score,
24 reason=reason # Critical for root cause analysis!
25 )
26
27# 3. Define your initial prompt
28initial_prompt = ChatPrompt(
29 project_name="reflective_optimization",
30 messages=[
31 {
32 "role": "system",
33 "content": "You are a helpful assistant that answers questions accurately."
34 },
35 {
36 "role": "user",
37 "content": "Question: {question}\n\nProvide a concise answer."
38 }
39 ]
40)
41
42# 4. Initialize the HierarchicalReflectiveOptimizer
43optimizer = HierarchicalReflectiveOptimizer(
44 model="gpt-4o",
45 n_threads=8,
46 max_parallel_batches=5,
47 seed=42,
48 model_parameters={"temperature": 0.7}
49)
50
51# 5. Run the optimization
52optimization_result = optimizer.optimize_prompt(
53 prompt=initial_prompt,
54 dataset=dataset,
55 metric=answer_quality_metric,
56 n_samples=100,
57 max_trials=5,
58 max_retries=2
59)
60
61# 6. View the results
62optimization_result.display()

Configuration Options

Optimizer parameters

The optimizer has the following parameters:

model
str
LiteLLM model name for optimizer’s internal reasoning/generation calls
n_threads
int
verbose
int
Controls internal logging/progress bars (0=off, 1=on).
seed
int
Random seed for reproducibility (default: 42)
max_parallel_batches
int
batch_size
int
convergence_threshold
float
model_parameters
dict[str, typing.Any] | None
Optional dict of LiteLLM parameters for optimizer’s internal LLM calls.

optimize_prompt parameters

The optimize_prompt method has the following parameters:

prompt
ChatPrompt
dataset
Dataset
Opik dataset name, or Opik dataset
metric
Callable
A metric function, this function should have two arguments:
experiment_config
dict | None
n_samples
int | None
auto_continue
bool
agent_class
type[opik_optimizer.optimizable_agent.OptimizableAgent] | None
project_name
str
max_trials
int
max_retries
int
kwargs
Any

Model Support

There are two models to consider when using the HierarchicalReflectiveOptimizer:

  • HierarchicalReflectiveOptimizer.model: The model used for the root cause analysis and failure mode synthesis.
  • ChatPrompt.model: The model used to evaluate the prompt.

The model parameter accepts any LiteLLM-supported model string (e.g., "gpt-4o", "azure/gpt-4", "anthropic/claude-3-opus", "gemini/gemini-1.5-pro"). You can also pass in extra model parameters using the model_parameters parameter:

1optimizer = HierarchicalReflectiveOptimizer(
2 model="anthropic/claude-3-opus-20240229",
3 model_parameters={
4 "temperature": 0.7,
5 "max_tokens": 4096
6 }
7)

Next Steps

  1. Explore specific Optimizers for algorithm details.
  2. Refer to the FAQ for common questions and troubleshooting.
  3. Refer to the API Reference for detailed configuration options.