For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
    • Overview
  • Intro
    • Opik Overview
    • Next steps / Set expectations
  • Observability
    • Log Traces
    • Annotate Traces
  • Evaluation
    • Evaluation Concepts and Overview
    • Create Evaluation Datasets
    • Define Evaluation Metrics
    • Evaluate your LLM Application
    • No-code LLM Evaluation Workflow
  • Prompt Engineering
    • Prompt Management
    • Prompt Playground
  • Testing
    • PyTest Integration
  • Production Monitoring
    • Online Evaluation Rules
LogoLogo
Copy to LLMGithubGo to App
On this page
  • Continuous Production Monitoring with Real-Time Evaluation
  • Key Highlights
Production Monitoring

Online Evaluation Rules

Was this page helpful?
Previous
Built with

Continuous Production Monitoring with Real-Time Evaluation

This video introduces production monitoring through Opik’s online evaluation rules, enabling continuous quality assessment of your LLM applications once they’re deployed. Unlike batch evaluations used during development, online evaluation rules automatically score production traces in near real-time as they’re logged to Opik. You’ll learn to create custom LLM-as-a-judge metrics for helpfulness and moderation that provide ongoing insights into production performance.

Key Highlights

  • Real-Time Production Scoring: Automatic evaluation of production traces as they occur, moving beyond development-time batch evaluations to continuous monitoring
  • Dual Metric Types: Choose between LLM-as-a-judge metrics for sophisticated evaluation and code metrics for programmatic scoring based on your needs
  • Custom Evaluation Prompts: Create tailored prompts using mustache/handlebar notation to inject input/output variables for context-aware scoring
  • Flexible Scoring Scales: Configure different scoring ranges (1-10 for helpfulness, 0-1 for moderation) with customizable output types (integer, float, boolean)
  • Variable Mapping System: Map evaluation variables to specific fields in your data structure, ensuring accurate context injection for scoring
  • Comprehensive Score Tracking: View evaluation results directly in trace views with associated reasoning explanations for each score decision
  • Multiple Rule Support: Run multiple evaluation rules simultaneously (helpfulness, moderation, etc.) for multi-dimensional quality assessment
  • Production Benefits: Continuous monitoring, performance degradation detection, development-production feedback loops, and problematic pattern identification
  • Actionable Insights: Each score includes AI reasoning (e.g., “correctly identifies Paris as capital”) enabling understanding of evaluation decisions