After each optimization run, visit the Opik dashboard to understand what changed and decide whether to ship the new prompt.
Plots every trial score in chronological order and highlights the current best prompt. Hover to read exact values and see percentage improvements.
Lists each trial, the optimizer used, the prompt JSON, and per-trial scores. Click a trial row to expand dataset items and attached traces.
When you expand a trial, you can inspect every dataset item that ran during that trial plus the corresponding trace tree (tool calls, attachments, etc.).
HRPO runs add a panel that clusters similar failures. Expand a cluster to read metric reasons and sample traces.
Confirms how many dataset rows were sampled per trial so you can judge statistical significance.
While the UI currently focuses on analysis, you can always pull prompts and history directly from the SDK after the run finishes:
Use optimized_prompt to update your application and history to build custom reports or attach evidence to pull requests.