For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
  • Getting Started
    • Home
    • Quickstart
    • Quickstart notebook
    • Roadmap
    • FAQ
    • Changelog
  • Observability
    • Concepts
    • Log traces
    • Log conversations
    • Log user feedback
    • Log media & attachments
    • Cost tracking
    • Opik Assist
  • Evaluation
    • Overview
    • Concepts
    • Manage datasets
    • Evaluate single prompts
    • Evaluate your agent
    • Evaluate agent trajectories
    • Evaluate multimodal traces
    • Evaluate multi-turn agents
    • Manually logging experiments
    • Re-running an existing experiment
    • Annotation Queues
      • Overview
      • Heuristic metrics
      • Hallucination
      • LLM Juries
      • G-Eval
      • Conversation-level GEval
      • Compliance risk
      • Prompt uncertainty
      • Moderation
      • Meaning Match
      • Usefulness
      • Summarization consistency
      • Summarization coherence
      • Dialogue helpfulness
      • Answer relevance
      • Context precision
      • Context recall
      • Trajectory accuracy
      • Agent task completion
      • Agent tool correctness
      • Conversational metrics
      • Custom model
      • Advanced configuration
      • Custom metric
      • Custom conversation metric
      • Structured Output Compliance
      • Task span metrics
  • Prompt engineering
    • Prompt management
    • Prompt Playground
    • Prompt Generator and Improver
    • Opik's MCP server
  • Testing
    • Pytest integration
  • Production
    • Production monitoring
    • Online Evaluation rules
    • Gateway
    • Guardrails
    • Anonymizers
    • Alerts
    • Dashboards
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • Summarization Consistency Judge
  • Inputs
  • Configuration
EvaluationMetrics

Summarization consistency

Was this page helpful?
Previous

Summarization coherence

Next
Built with

Summarization Consistency Judge

SummarizationConsistencyJudge compares a generated summary with the original document (or transcript) and scores how faithfully key facts were preserved. It follows the GEval method: expanding your instructions into a chain-of-thought rubric, then grading on a 0.0–1.0 scale (derived from a raw 0–10 judgement) with detailed explanations.

Use it when you automatically summarise support tickets, research reports, or call transcripts and want to catch hallucinations before they reach end users.

Checking summary faithfulness
1from opik.evaluation.metrics import SummarizationConsistencyJudge
2
3metric = SummarizationConsistencyJudge(model="gpt-4o")
4
5payload = """CONTEXT: Acme's Q2 revenue grew 12% thanks to the launch of Product Vega.
6CONTEXT: Operating margin declined to 14% because of R&D hiring.
7SUMMARY: Acme's revenue was flat but margins improved due to new hires.
8"""
9
10score = metric.score(output=payload)
11
12print(score.value) # 0.0–1.0 after normalisation
13print(score.reason)

Inputs

ArgumentTypeRequiredDescription
inputstrOptionalSource document or context.
outputstrYesPayload combining the source material and the candidate summary.

Configuration

ParameterDefaultNotes
modelgpt-5-nanoSwap to a larger evaluator for longer or more technical content.
temperature0.0Keep low for deterministic scoring; raise slightly to sample different critiques.
trackTrueDisable to skip sending traces to Opik.
project_nameNoneOverride when logging scores.

The evaluator emits an integer between 0 and 10 that Opik normalises to 0–1; the reason field captures the rubric notes explaining the judgement.