For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
  • Getting Started
    • Home
    • Quickstart
    • MCP Server
    • Ollie Agent
    • FAQ
    • Changelog
    • Upgrading to Opik 2.0
  • Observability
    • Overview
    • Getting started
    • Concepts
    • Debugging agents with Ollie and Opik Connect
  • Development
    • Overview
    • Agent playground
    • Prompt playground
  • Evaluation
    • Overview
    • Getting started
    • Concepts
      • Overview
      • Heuristic metrics
      • Hallucination
      • LLM Juries
      • G-Eval
      • Conversation-level GEval
      • Compliance risk
      • Prompt uncertainty
      • Moderation
      • Meaning Match
      • Usefulness
      • Summarization consistency
      • Summarization coherence
      • Dialogue helpfulness
      • Answer relevance
      • Context precision
      • Context recall
      • Trajectory accuracy
      • Agent task completion
      • Agent tool correctness
      • Conversational metrics
      • Custom model
      • Advanced configuration
      • Custom metric
      • Custom conversation metric
      • Structured Output Compliance
      • Task span metrics
  • Production
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • Dialogue Helpfulness Judge
  • Inputs
  • Configuration
EvaluationMetrics

Dialogue helpfulness

Was this page helpful?
Previous

Answer relevance

Next
Built with

Dialogue Helpfulness Judge

DialogueHelpfulnessJudge inspects the latest assistant reply in the context of preceding turns. It rewards responses that acknowledge the user’s request, use the available context, and offer actionable guidance.

Scoring a support reply
1from opik.evaluation.metrics import DialogueHelpfulnessJudge
2
3turns = """USER: My VPN disconnects every 5 minutes.\nASSISTANT: Try reinstalling the client.\nUSER: I already did.\n"""
4
5metric = DialogueHelpfulnessJudge()
6score = metric.score(
7 input=turns,
8 output="Can you send logs? I'll escalate to network engineering.",
9)
10
11print(score.value)
12print(score.reason)

Inputs

ArgumentTypeRequiredDescription
inputstrOptionalConversation history (alternating USER / ASSISTANT blocks).
conversationlist[dict]OptionalStructured turns ({"role": "user", "content": "..."}
outputstrYesLatest assistant reply to score.

Configuration

ParameterDefaultNotes
modelgpt-5-nanoSwitch to a larger evaluator for complex enterprise workflows.
temperature0.0Use low temperature for reproducible benchmarks.
trackTrueRecord the evaluation in Opik.
project_nameNoneSet when routing results to a different project.

Integrate this judge into regression suites to catch regressions after prompt changes or upgrades to your assistant model.