Online Evaluation Rules

Continuous Production Monitoring with Real-Time Evaluation

This video introduces production monitoring through Opik’s online evaluation rules, enabling continuous quality assessment of your LLM applications once they’re deployed. Unlike batch evaluations used during development, online evaluation rules automatically score production traces in near real-time as they’re logged to Opik. You’ll learn to create custom LLM-as-a-judge metrics for helpfulness and moderation that provide ongoing insights into production performance.

Key Highlights

  • Real-Time Production Scoring: Automatic evaluation of production traces as they occur, moving beyond development-time batch evaluations to continuous monitoring
  • Dual Metric Types: Choose between LLM-as-a-judge metrics for sophisticated evaluation and code metrics for programmatic scoring based on your needs
  • Custom Evaluation Prompts: Create tailored prompts using mustache/handlebar notation to inject input/output variables for context-aware scoring
  • Flexible Scoring Scales: Configure different scoring ranges (1-10 for helpfulness, 0-1 for moderation) with customizable output types (integer, float, boolean)
  • Variable Mapping System: Map evaluation variables to specific fields in your data structure, ensuring accurate context injection for scoring
  • Comprehensive Score Tracking: View evaluation results directly in trace views with associated reasoning explanations for each score decision
  • Multiple Rule Support: Run multiple evaluation rules simultaneously (helpfulness, moderation, etc.) for multi-dimensional quality assessment
  • Production Benefits: Continuous monitoring, performance degradation detection, development-production feedback loops, and problematic pattern identification
  • Actionable Insights: Each score includes AI reasoning (e.g., “correctly identifies Paris as capital”) enabling understanding of evaluation decisions