For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
  • Getting Started
    • Home
    • Quickstart
    • Upgrading to Opik 2.0
    • Ollie Agent
    • FAQ
    • Changelog
  • Observability
    • Overview
    • Getting started
    • Concepts
    • Debugging agents with Ollie and Opik Connect
  • Development
    • Overview
    • Agent playground
    • Prompt playground
      • Opik Agent Optimizer
      • Optimization Studio
      • Quickstart
      • Quickstart notebook
      • FAQ
      • Changelog
      • Known Issues
        • Optimizer introduction
        • Synthetic data optimizer
        • ARC-AGI tutorial
        • Multimodal agent tutorial
  • Evaluation
    • Overview
    • Getting started
    • Concepts
  • Production
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • What is the multimodal optimizer example?
  • Why use optimizers here?
  • How the SDK example works
  • Next steps
DevelopmentOptimization runsCookbooks & Tutorials

Multimodal Agent Optimization Tutorial

Tutorial example inspired by a self-driving car vision agent

Was this page helpful?
Previous

Extending Optimizers

Extend Opik with custom optimization algorithms and contributions.
Next
Built with

This tutorial outlines how to optimize a multimodal agent (vision + text) and links to the full walkthrough for a self-driving car scenario. The SDK already includes a working example script and dataset you can run locally.

Full guide: Automatic prompt optimization for multimodal vision agents (self-driving car example).

Codebase entry point: sdks/opik_optimizer/scripts/multimodal_example.py using the driving hazard dataset in sdks/opik_optimizer/src/opik_optimizer/datasets/driving_hazard.py.

What is the multimodal optimizer example?

The SDK includes a complete example that optimizes a vision agent on a driving hazard dataset. It demonstrates how to pass image content parts through ChatPrompt, score outputs, and compare trials in the Optimization Studio.

Why use optimizers here?

Multimodal prompts are sensitive to phrasing and output structure. Running HRPO or MetaPrompt helps you converge on safer, more consistent outputs without rewriting prompts manually.

How the SDK example works

  1. multimodal_example.py loads the driving hazard dataset (images + hazard labels).
  2. A multimodal ChatPrompt inserts an image URL content part next to the textual instruction.
  3. The metric (Levenshtein ratio) scores predicted hazard text against the expected label.
  4. HRPO optimizes the prompt using the training split, with a small validation split for ranking.
  5. Results display in the Opik UI (Optimization runs and trial details).

Screenshot placeholder: multimodal trial comparisons and failure analysis.

Next steps

  • Explore the full SDK script and adapt the dataset to your own vision tasks.
  • Use pass@k evaluation (n parameter) to reduce stochastic failures.
  • Read the full external guide for the complete workflow and visuals.