Dialogue helpfulness

Dialogue Helpfulness Judge

DialogueHelpfulnessJudge inspects the latest assistant reply in the context of preceding turns. It rewards responses that acknowledge the userโ€™s request, use the available context, and offer actionable guidance.

Scoring a support reply
1from opik.evaluation.metrics import DialogueHelpfulnessJudge
2
3turns = """USER: My VPN disconnects every 5 minutes.\nASSISTANT: Try reinstalling the client.\nUSER: I already did.\n"""
4
5metric = DialogueHelpfulnessJudge()
6score = metric.score(
7 input=turns,
8 output="Can you send logs? I'll escalate to network engineering.",
9)
10
11print(score.value)
12print(score.reason)

Inputs

ArgumentTypeRequiredDescription
inputstrOptionalConversation history (alternating USER / ASSISTANT blocks).
conversationlist[dict]OptionalStructured turns ({"role": "user", "content": "..."}
outputstrYesLatest assistant reply to score.

Configuration

ParameterDefaultNotes
modelgpt-5-nanoSwitch to a larger evaluator for complex enterprise workflows.
temperature0.0Use low temperature for reproducible benchmarks.
trackTrueRecord the evaluation in Opik.
project_nameNoneSet when routing results to a different project.

Integrate this judge into regression suites to catch regressions after prompt changes or upgrades to your assistant model.