Optimize prompts
Use this playbook whenever you need to improve a prompt (single-turn or agentic) and want a repeatable process rather than manual tweaks.
1. Establish baselines
- Record the current prompt and score using your production metric.
- Log at least 10 representative dataset rows so the optimizer can generalize.
- Capture latency and token costs—optimizations should not regress them unexpectedly.
2. Choose an optimizer
3. Configure the run
- Set
project_nameon theChatPromptto group runs by team or initiative. - Start with
max_trials= 3–5. Increase once you confirm the metric is reliable. - Use
n_samplesto limit cost during early exploration; rerun on the full dataset before promoting a prompt.
4. Evaluate outcomes
- Compare
result.scorevs.result.initial_scoreto ensure material improvement. - Review the
historyattribute for regression reasons. - Use Dashboard results to visualize per-trial performance.
5. Ship safely
Export the prompt
result.prompt returns the best-performing ChatPrompt. Serialize it as JSON and check it into your repo.