skip to Main Content

Run open source LLM evaluations with Opik!

LEONARDO GONZALEZ

VP of AI Center of Excellence at Trilogy

Leonardo Gonzalez has over two decades of leadership experience in the IT industry. His specialties are AI engineering and data science, software engineering and architecture, and technical leadership. He enjoys working with knowledge management and predictive modeling. Leonardo has held several consulting CTO positions with startups, and has held leadership roles at Fortune 500 companies. He is currently VP of the AI Center of Excellence at Trilogy.

May 14th, 3:35 – 4:45PM ET

Navigating Uncharted Metrics: Reference-Free LLM Evaluation with G-Eval

Reference-based benchmarks struggle to capture the open-ended behaviour of frontier language models. This one-hour, video-based workshop introduces G-Eval, a reference-free evaluation paradigm in which a powerful LLM generates task-specific rubrics, chain-of-thought rationales, and quantitative grades to assess the outputs of other models. This extends current evaluation practice while sidestepping ground-truth dependencies.

A concise theoretical overview motivates the approach and surfaces its chief challenges: reliability, objectivity, bias propagation, and reproducibility. The core of the session adapts a publicly available repository to build an end-to-end G-Eval pipeline, illustrating rubric synthesis, reasoning-trace parsing, and numeric scoring.

The workshop delivers a reproducible template for large-scale, rubric-driven benchmarking and highlights open research questions, mapping a clearer path toward robust, reference-free evaluation of next-generation language models.

Back To Top