For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
    • Overview
  • Intro
    • Opik Overview
    • Next steps / Set expectations
  • Observability
    • Log Traces
    • Annotate Traces
  • Evaluation
    • Evaluation Concepts and Overview
    • Create Evaluation Datasets
    • Define Evaluation Metrics
    • Evaluate your LLM Application
    • No-code LLM Evaluation Workflow
  • Prompt Engineering
    • Prompt Management
    • Prompt Playground
  • Testing
    • PyTest Integration
  • Production Monitoring
    • Online Evaluation Rules
LogoLogo
Copy to LLMGithubGo to App
On this page
  • From Subjective Assessment to Quantifiable Metrics
  • Key Highlights
Evaluation

Define Evaluation Metrics

Was this page helpful?
Previous

Evaluate your LLM Application

Next
Built with

From Subjective Assessment to Quantifiable Metrics

This video explores Opik’s comprehensive metrics system that transforms subjective LLM assessment into quantifiable measurements. You’ll discover the different types of automated scoring methods available, see practical examples using Answer Relevance and Levenshtein metrics, and learn how to create custom metrics when needed. The video also covers cost considerations and best practices for combining multiple metrics to capture different dimensions of quality.

Key Highlights

  • Comprehensive Metric Types: Choose from heuristic metrics (exact match, contains, regex, JSON validation), hallucination detection, and LLM-as-a-judge approaches like GEVAL
  • Easy Implementation: Import metrics directly from opik.evaluation.metrics and instantiate classes - demonstrated with Answer Relevance and Levenshtein ratio
  • Custom Metric Development: Create your own metrics by extending the base metric class from Opik repository when built-in options don’t meet your needs
  • UI Integration: View metrics in trace overview by scrolling right or opening feedback scores section, with ability to manually add/remove scores
  • Manual Feedback Definition: Create custom feedback definitions in Configuration section for human-applied metrics like pass/fail classifications
  • Cost-Aware Evaluation: Consider trade-offs between evaluation speed, depth, and cost - especially with expensive thinking models for LLM-as-a-judge approaches
  • Multi-Dimensional Assessment: Combine multiple metrics (e.g., factual accuracy + helpfulness) to get complete quality pictures rather than single-metric evaluation
  • Filtering Capabilities: Use feedback scores to filter traces and identify patterns in model performance across different quality dimensions