Dialogue Helpfulness Judge

DialogueHelpfulnessJudge inspects the latest assistant reply in the context of preceding turns. It rewards responses that acknowledge the user’s request, use the available context, and offer actionable guidance.

Scoring a support reply

1 from opik.evaluation.metrics import DialogueHelpfulnessJudge
2 
3 turns = """USER: My VPN disconnects every 5 minutes.\nASSISTANT: Try reinstalling the client.\nUSER: I already did.\n"""
4 
5 metric = DialogueHelpfulnessJudge()
6 score = metric.score(
7     input=turns,
8     output="Can you send logs? I'll escalate to network engineering.",
9 )
10 
11 print(score.value)
12 print(score.reason)

Inputs

Argument	Type	Required	Description
`input`	`str`	Optional	Conversation history (alternating USER / ASSISTANT blocks).
`conversation`	`list[dict]`	Optional	Structured turns (`{"role": "user", "content": "..."}`
`output`	`str`	Yes	Latest assistant reply to score.

Configuration

Parameter	Default	Notes
`model`	`gpt-5-nano`	Switch to a larger evaluator for complex enterprise workflows.
`temperature`	`0.0`	Use low temperature for reproducible benchmarks.
`track`	`True`	Record the evaluation in Opik.
`project_name`	`None`	Set when routing results to a different project.

Integrate this judge into regression suites to catch regressions after prompt changes or upgrades to your assistant model.