Prompt Uncertainty

Prompt uncertainty scoring helps you triage risky or underspecified user requests before they reach your production model. PromptUncertaintyJudge highlights missing context or conflicting instructions that could confuse an assistant.

Run the judge on raw prompts to decide whether to request clarification, route to a human, or fan out to more capable models.

Triaging tricky prompts
1from opik.evaluation.metrics import PromptUncertaintyJudge
2
3prompt = (
4 "Summarise the attached 200-page legal agreement into a single bullet, "
5 "guaranteeing there are no omissions."
6)
7
8uncertainty = PromptUncertaintyJudge().score(input=prompt)
9
10print(uncertainty.value, uncertainty.reason)

Inputs

The judge accepts a single string via the input keyword. You can optionally pass additional metadata (dataset row contents, prompt IDs) via keyword arguments – these will be forwarded to the underlying base metric for tracking.

Configuration

ParameterDefaultNotes
modelgpt-5-nanoSwap to any LiteLLM chat model if you need a larger evaluator.
temperature0.0Lower values improve reproducibility; higher values explore more interpretations.
trackTrueDisable to skip logging evaluations.
project_nameNoneOverride the project when logging results.

The evaluator emits an integer between 0 and 10 (normalised to 0–1 by Opik). Inspect the reason text for rationale and per-criterion feedback, and trigger follow-up automations when scores cross a threshold.