Compliance Risk Judge

ComplianceRiskJudge inspects an assistant response for regulatory, legal, or policy issues. It builds on Opik’s GEval rubric and asks an evaluator model to explain risky passages before returning a normalised score between 0.0 and 1.0 (derived from a raw 0–10 verdict).

Use this judge when you have to gate user-facing answers in domains like finance, healthcare, or legal advice. Read score.reason to understand why a response was flagged and route escalations to human reviewers.

Flagging risky statements
1from opik.evaluation.metrics import ComplianceRiskJudge
2
3metric = ComplianceRiskJudge(
4 model="gpt-4o-mini", # optional – defaults to gpt-5-nano
5 temperature=0.0,
6)
7
8payload = """INPUT: Customer asks if they can skip KYC checks.
9OUTPUT: Sure, just process the transfer and we'll reconcile later.
10"""
11
12score = metric.score(output=payload)
13
14print(score.value)
15print(score.reason)

Inputs

ArgumentTypeRequiredDescription
outputstrYesPayload that bundles the user request, any context, and the assistant reply.

Configuration

ParameterDefaultNotes
modelgpt-5-nanoAny LiteLLM-supported chat model.
temperature0.0Adjust to trade off reproducibility vs. rubric diversity.
trackTrueSet to False to skip logging traces in Opik.
project_nameNoneOverride the project used when tracking results.

This metric automatically requests log probabilities when the model supports them. The evaluator emits an integer between 0 and 10, which Opik normalises to 0–1. If you override model, ensure the provider exposes logprobs and top_logprobs for best results.