Optimize prompts

Use this playbook whenever you need to improve a prompt (single-turn or agentic) and want a repeatable process rather than manual tweaks.

1. Establish baselines

  • Record the current prompt and score using your production metric.
  • Log at least 10 representative dataset rows so the optimizer can generalize.
  • Capture latency and token costs—optimizations should not regress them unexpectedly.

2. Choose an optimizer

ScenarioRecommended optimizer
General prompt copy editsMetaPrompt
Complex failure analysisHRPO
Need diverse candidatesEvolutionary
Few-shot heavy promptsFew-Shot Bayesian
Tune sampling paramsParameter optimizer

3. Configure the run

1from opik_optimizer import HRPO
2
3optimizer = HRPO(
4 model="openai/gpt-4o",
5 max_parallel_batches=4,
6 seed=42,
7)
8result = optimizer.optimize_prompt(
9 prompt=my_prompt,
10 dataset=my_dataset,
11 metric=answer_quality,
12 max_trials=5,
13 n_samples=50,
14)
  • Set project_name on the optimizer to group runs by team or initiative.
  • Start with max_trials = 3–5. Increase once you confirm the metric is reliable.
  • Use n_samples to limit cost during early exploration; rerun on the full dataset before promoting a prompt.
  • For optimizers with inner-loop evaluations (HRPO, GEPA), set n_samples_minibatch to keep those steps lightweight.
  • Use n_samples_strategy to keep subsampling deterministic (default: "random_sorted").

Optimize multiple prompts together

You can pass a dict of ChatPrompt objects to optimize a coordinated prompt bundle (for example, a multi-agent setup or system/user prompt pair that must stay in sync). Each key names a prompt and is preserved through optimization.

1from opik_optimizer import MetaPromptOptimizer, ChatPrompt
2
3prompts = {
4 "researcher": ChatPrompt(
5 name="researcher",
6 messages=[
7 {"role": "system", "content": "Gather facts and cite sources."},
8 {"role": "user", "content": "{question}"},
9 ],
10 ),
11 "synthesizer": ChatPrompt(
12 name="synthesizer",
13 messages=[
14 {"role": "system", "content": "Summarize findings clearly."},
15 {"role": "user", "content": "{question}"},
16 ],
17 ),
18}
19
20optimizer = MetaPromptOptimizer(model="openai/gpt-4o-mini", prompts_per_round=2)
21result = optimizer.optimize_prompt(
22 prompt=prompts,
23 dataset=my_dataset,
24 metric=answer_quality,
25 max_trials=3,
26)

result.prompt returns a dict keyed by the same names so you can update each agent prompt together.

4. Evaluate outcomes

  • Compare result.score vs. result.initial_score to ensure material improvement.
  • Review the history attribute for regression reasons.
  • Use Dashboard results to visualize per-trial performance.

5. Ship safely

1

Export the prompt

result.prompt returns the best-performing ChatPrompt. Serialize it as JSON and check it into your repo.

2

Automate regression tests

Wire the optimizer run into CI with a smaller dataset so future prompt edits have guardrails.

3

Monitor in production

Trace the new prompt with Opik tracing to confirm real-world performance matches experiment results.