Agent Tool Correctness Judge
AgentToolCorrectnessJudge checks if an agent called the right tools with valid arguments and interpreted the outputs accurately. It’s invaluable for diagnosing production agents that orchestrate APIs, databases, or internal services.
Inspect tool usage
Inputs
Configuration
The judge emits an integer between 0 and 10 (scaled to 0–1 by Opik); read score.reason to pinpoint incorrect calls, missing validations, or misinterpreted outputs.