Use this playbook whenever you need to improve a prompt (single-turn or agentic) and want a repeatable process rather than manual tweaks.
project_name on the optimizer to group runs by team or initiative.max_trials = 3–5. Increase once you confirm the metric is reliable.n_samples to limit cost during early exploration; rerun on the full dataset before promoting a prompt.n_samples_minibatch to keep those steps lightweight.n_samples_strategy to keep subsampling deterministic (default: "random_sorted").Tool optimization is now documented separately. Use it when you want to improve MCP tool descriptions without changing prompt text.
If you need finer control than roles (for example, only optimize a specific assistant message),
use prompt_segments to extract and update parts by segment ID.
Intent/Trigger: use segment-level updates when you need to constrain changes to exact message segments.
prompt, dataset, metricupdates passed to prompt_segments.apply_segment_updates)optimizer.optimize_prompt(prompt=updated_prompt, dataset=my_dataset, metric=answer_quality)You can pass a dict of ChatPrompt objects to optimize a coordinated prompt bundle (for example, a multi-agent setup or system/user prompt pair that must stay in sync). Each key names a prompt and is preserved through optimization.
result.prompt returns a dict keyed by the same names so you can update each agent prompt together.
result.score vs. result.initial_score to ensure material improvement.history attribute for regression reasons.result.prompt returns the best-performing ChatPrompt. Serialize it as JSON and check it into your repo.