For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
  • Getting Started
    • Home
    • Quickstart
    • Upgrading to Opik 2.0
    • Ollie Agent
    • FAQ
    • Changelog
  • Observability
    • Overview
    • Getting started
    • Concepts
    • Debugging agents with Ollie and Opik Connect
  • Development
    • Overview
    • Agent playground
    • Prompt playground
      • Opik Agent Optimizer
      • Optimization Studio
      • Quickstart
      • Quickstart notebook
      • FAQ
      • Changelog
      • Known Issues
        • Optimizer introduction
        • Synthetic data optimizer
        • ARC-AGI tutorial
        • Multimodal agent tutorial
  • Evaluation
    • Overview
    • Getting started
    • Concepts
  • Production
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • What is ARC-AGI?
  • Why use optimizers here?
  • How the SDK implementation works
  • Next steps
DevelopmentOptimization runsCookbooks & Tutorials

ARC-AGI Optimization Tutorial

Tutorial example using ARC-AGI style code tasks

Was this page helpful?
Previous

Multimodal Agent Optimization Tutorial

Tutorial example inspired by a self-driving car vision agent

Next
Built with

This guide introduces ARC-AGI, why it is a strong fit for optimizer-driven prompt iteration, and where to find the full, runnable implementation in the SDK.

Codebase entry point: sdks/opik_optimizer/scripts/arc_agi/tasks_optimizer.py and the ARC-AGI utilities in sdks/opik_optimizer/scripts/arc_agi/.

What is ARC-AGI?

ARC-AGI tasks are grid-based reasoning puzzles that test an agent’s ability to infer transformation rules from a few examples. They are a natural fit for optimization because small prompt changes can dramatically improve generalization across tasks.

Why use optimizers here?

ARC-AGI evaluation is deterministic and repeatable, which makes it ideal for iterative optimization. HRPO is especially useful because it captures failure modes and proposes targeted fixes.

How the SDK implementation works

The SDK ships a full ARC-AGI workflow you can run locally:

  1. Dataset loader: sdks/opik_optimizer/src/opik_optimizer/datasets/arc_agi2.py loads ARC-AGI-2 tasks and embeds optional grid images.
  2. Prompt templates: sdks/opik_optimizer/scripts/arc_agi/prompts/ contains system and HRPO prompt templates.
  3. Evaluator + metrics: sdks/opik_optimizer/scripts/arc_agi/utils/code_evaluator.py executes candidate solvers and scores ARC-AGI metrics via utils/metrics.py.
  4. Optimizer wiring: tasks_optimizer.py connects dataset, HRPO, metrics, and logging into a repeatable run.

If you want to run the code as-is, start with the tasks_optimizer.py entry point and follow the CLI flags listed at the top of that file.

Next steps

  • Explore the ARC-AGI scripts in the repo and swap in your own datasets or prompt templates.
  • Review the run summaries under scripts/arc_agi/ to compare optimizer iterations.