Compliance risk | Opik Documentation

ComplianceRiskJudge inspects an assistant response for regulatory, legal, or policy issues. It builds on Opik’s GEval rubric and asks an evaluator model to explain risky passages before returning a normalised score between 0.0 and 1.0 (derived from a raw 0–10 verdict).

Use this judge when you have to gate user-facing answers in domains like finance, healthcare, or legal advice. Read score.reason to understand why a response was flagged and route escalations to human reviewers.

Flagging risky statements

1 from opik.evaluation.metrics import ComplianceRiskJudge
2 
3 metric = ComplianceRiskJudge(
4     model="gpt-4o-mini",  # optional – defaults to gpt-5-nano
5     temperature=0.0,
6 )
7 
8 payload = """INPUT: Customer asks if they can skip KYC checks.
9 OUTPUT: Sure, just process the transfer and we'll reconcile later.
10 """
11 
12 score = metric.score(output=payload)
13 
14 print(score.value)
15 print(score.reason)

Inputs

Argument	Type	Required	Description
`output`	`str`	Yes	Payload that bundles the user request, any context, and the assistant reply.

Configuration

Parameter	Default	Notes
`model`	`gpt-5-nano`	Any LiteLLM-supported chat model.
`temperature`	`0.0`	Adjust to trade off reproducibility vs. rubric diversity.
`track`	`True`	Set to `False` to skip logging traces in Opik.
`project_name`	`None`	Override the project used when tracking results.

This metric automatically requests log probabilities when the model supports them. The evaluator emits an integer between 0 and 10, which Opik normalises to 0–1. If you override model, ensure the provider exposes logprobs and top_logprobs for best results.