Trajectory Accuracy

TrajectoryAccuracy checks how closely a ReAct-style agent followed a sensible sequence of thoughts, actions, and observations to achieve the stated goal. It is useful for auditing complex workflow agents and reinforcement-learning traces.

Auditing an agent run

1 from opik.evaluation.metrics import TrajectoryAccuracy
2 
3 metric = TrajectoryAccuracy()
4 
5 score = metric.score(
6     goal="Book travel to Paris",
7     trajectory=[
8         {
9             "thought": "Check available flights",
10             "action": "search_flights(destination='Paris')",
11             "observation": "Found flights for next week",
12         },
13         {
14             "thought": "Summarise the best option",
15             "action": "summarise(options)",
16             "observation": "Shared top three flights",
17         },
18     ],
19     final_result="Here are the best flights to Paris next week.",
20 )
21 
22 print(score.value)  # Already normalised between 0.0 and 1.0
23 print(score.reason)  # Explanation of the verdict

Inputs

Argument	Type	Required	Description
`goal`	`str`	Yes	The agent’s objective or task description.
`trajectory`	`list[dict]`	Yes	Sequence of steps with `thought`, `action`, and `observation` keys.
`final_result`	`str`	Yes	Outcome that the agent reported after completing the trajectory.

Configuration

Parameter	Default	Notes
`model`	`gpt-5-nano`	Judge used to score the trajectory.
`temperature`	`None`	Forwarded to the underlying model when provided.
`track`	`True`	Disable to skip logging to Opik. When `False`, disables tracing for both the metric and underlying LLM judge calls.
`project_name`	`None`	Override the tracking project name.

The metric returns a value in the 0.0–1.0 range together with a detailed explanation highlighting missing steps, misaligned actions, or other issues.