Dialogue Helpfulness Judge

DialogueHelpfulnessJudge inspects the latest assistant reply in the context of preceding turns. It rewards responses that acknowledge the user’s request, use the available context, and offer actionable guidance.

Scoring a support reply
1from opik.evaluation.metrics import DialogueHelpfulnessJudge
2
3turns = """USER: My VPN disconnects every 5 minutes.\nASSISTANT: Try reinstalling the client.\nUSER: I already did.\n"""
4
5metric = DialogueHelpfulnessJudge()
6score = metric.score(
7 input=turns,
8 output="Can you send logs? I'll escalate to network engineering.",
9)
10
11print(score.value)
12print(score.reason)

Inputs

ArgumentTypeRequiredDescription
inputstrOptionalConversation history (alternating USER / ASSISTANT blocks).
conversationlist[dict]OptionalStructured turns ({"role": "user", "content": "..."}
outputstrYesLatest assistant reply to score.

Configuration

ParameterDefaultNotes
modelgpt-5-nanoSwitch to a larger evaluator for complex enterprise workflows.
temperature0.0Use low temperature for reproducible benchmarks.
trackTrueRecord the evaluation in Opik.
project_nameNoneSet when routing results to a different project.

Integrate this judge into regression suites to catch regressions after prompt changes or upgrades to your assistant model.