Multimodal Agent Optimization Tutorial
Tutorial example inspired by a self-driving car vision agent
Tutorial example inspired by a self-driving car vision agent
This tutorial outlines how to optimize a multimodal agent (vision + text) and links to the full walkthrough for a self-driving car scenario. The SDK already includes a working example script and dataset you can run locally.
Codebase entry point: sdks/opik_optimizer/scripts/multimodal_example.py using the driving hazard dataset in sdks/opik_optimizer/src/opik_optimizer/datasets/driving_hazard.py.
The SDK includes a complete example that optimizes a vision agent on a driving hazard dataset. It demonstrates how to pass image content parts through ChatPrompt, score outputs, and compare trials in the Optimization Studio.
Multimodal prompts are sensitive to phrasing and output structure. Running HRPO or MetaPrompt helps you converge on safer, more consistent outputs without rewriting prompts manually.
multimodal_example.py loads the driving hazard dataset (images + hazard labels).ChatPrompt inserts an image URL content part next to the textual instruction.Screenshot placeholder: multimodal trial comparisons and failure analysis.
n parameter) to reduce stochastic failures.