
LEONARDO GONZALEZ
VP of AI Center of Excellence at Trilogy
Leonardo Gonzalez has over two decades of leadership experience in the IT industry. His specialties are AI engineering and data science, software engineering and architecture, and technical leadership. He enjoys working with knowledge management and predictive modeling. Leonardo has held several consulting CTO positions with startups, and has held leadership roles at Fortune 500 companies. He is currently VP of the AI Center of Excellence at Trilogy.
May 14th, 3:35 – 4:45PM ET
Navigating Uncharted Metrics: Reference-Free LLM Evaluation with G-Eval
Reference-based benchmarks struggle to capture the open-ended behaviour of frontier language models. This one-hour, video-based workshop introduces G-Eval, a reference-free evaluation paradigm in which a powerful LLM generates task-specific rubrics, chain-of-thought rationales, and quantitative grades to assess the outputs of other models. This extends current evaluation practice while sidestepping ground-truth dependencies.
A concise theoretical overview motivates the approach and surfaces its chief challenges: reliability, objectivity, bias propagation, and reproducibility. The core of the session adapts a publicly available repository to build an end-to-end G-Eval pipeline, illustrating rubric synthesis, reasoning-trace parsing, and numeric scoring.
The workshop delivers a reproducible template for large-scale, rubric-driven benchmarking and highlights open research questions, mapping a clearer path toward robust, reference-free evaluation of next-generation language models.