Optimize prompts
Use this playbook whenever you need to improve a prompt (single-turn or agentic) and want a repeatable process rather than manual tweaks.
1. Establish baselines
- Record the current prompt and score using your production metric.
- Log at least 10 representative dataset rows so the optimizer can generalize.
- Capture latency and token costs—optimizations should not regress them unexpectedly.
2. Choose an optimizer
3. Configure the run
- Set
project_nameon the optimizer to group runs by team or initiative. - Start with
max_trials= 3–5. Increase once you confirm the metric is reliable. - Use
n_samplesto limit cost during early exploration; rerun on the full dataset before promoting a prompt. - For optimizers with inner-loop evaluations (HRPO, GEPA), set
n_samples_minibatchto keep those steps lightweight. - Use
n_samples_strategyto keep subsampling deterministic (default:"random_sorted").
Optimize multiple prompts together
You can pass a dict of ChatPrompt objects to optimize a coordinated prompt bundle (for example, a multi-agent setup or system/user prompt pair that must stay in sync). Each key names a prompt and is preserved through optimization.
result.prompt returns a dict keyed by the same names so you can update each agent prompt together.
4. Evaluate outcomes
- Compare
result.scorevs.result.initial_scoreto ensure material improvement. - Review the
historyattribute for regression reasons. - Use Dashboard results to visualize per-trial performance.
5. Ship safely
Export the prompt
result.prompt returns the best-performing ChatPrompt. Serialize it as JSON and check it into your repo.
Related guides
- Optimization Studio
- Define datasets
- Define metrics
- Chaining optimizers
- Avoiding overfitting – Prevent your prompt from memorizing the training data by using separate validation datasets