Optimize multimodal prompts
Multimodal agents often juggle text instructions, image references, and structured outputs. Opik’s optimizers can work with any model that LiteLLM supports for images or videos (GPT-4o, Gemini, Claude 3.5 Sonnet vision, etc.). Make sure that both the optimizer’s model and your ChatPrompt.model accept the modality you plan to optimize. Otherwise, the run will fail or silently ignore the media.
Dataset design
- Store image or audio references as signed URLs in your dataset items (
metadata["image_url"]). - Include textual descriptions alongside assets so metrics can run without downloading large files when possible.
- Tag rows with modality info (
metadata["modality"] = "image+text") to filter during analysis.
Prompt structure
- Describe the expected output schema (JSON, markdown table, etc.) to reduce ambiguity.
Metrics
- Reuse existing text metrics when possible by comparing textual descriptions.
- For vision-specific scoring, call external models from your metric function, but cache results to control cost.
- Record reasons that mention the modality: “Image not described” or “Chart incorrectly transcribed”.
- When possible, augment automated metrics with lightweight human review or deterministic checks—LLM-as-a-judge signals can be noisy for multimodal tasks.
Running optimizations
- Start with MetaPrompt for wording improvements. For cold-start exploration, pair Evolutionary → Few-Shot Bayesian to uncover new structures and example choices.
- Use Hierarchical Reflective to catch recurring multimodal failures (e.g., missing chart descriptions) and highlight which dataset rows are problematic.
- Monitor token usage because multimodal prompts send larger payloads; pick models like
gpt-4o-miniwhen budgets are tight. - If an optimizer does not support the modality (e.g., text-only GEPA with image inputs), it will still mutate the prompt but cannot execute candidate evaluations—stick to optimizers whose evaluation models accept the modality.
Validation
- Spot-check generated outputs with the associated media in the dashboard.
- Confirm that dataset asset URLs remain valid for the duration of the optimization.
- When sharing results, include thumbnails or sample outputs so reviewers understand the changes.