-
SelfCheckGPT for LLM Evaluation
Detecting hallucinations in language models is challenging. There are three general approaches: Measuring token-level probability distributions for indications that a…
-
LLM Juries for Evaluation
Evaluating the correctness of generated responses is an inherently challenging task. LLM-as-a-Judge evaluators have gained popularity for their ability to…
-
G-Eval for LLM Evaluation
LLM-as-a-judge evaluators have gained widespread adoption due to their flexibility, scalability, and close alignment with human judgment. They excel at…
-
BERTScore For LLM Evaluation
Introduction BERTScore represents a pivotal shift in LLM evaluation, moving beyond traditional heuristic-based metrics like BLEU and ROUGE to a…
-
Perplexity for LLM Evaluation
Perplexity is, historically speaking, one of the “standard” evaluation metrics for language models. And while recent years have seen a…
-
Image Inpainting for SDXL 1.0 Base Model + Refiner
In this article, we’ll compare the results of SDXL 1.0 with its predecessor, Stable Diffusion 2.0. We’ll also take a…
-
Explainable AI: Visualizing Attention in Transformers
In this article we explore one of the most popular tools for visualizing the core distinguishing feature of transformer architectures:…
-
SAM + Stable Diffusion for Text-to-Image Inpainting
In this article, we’ll leverage the power of SAM, the first foundational model for computer vision, along with Stable Diffusion,…
-
Debugging Image Classifiers With Confusion Matrices
Introduction We often rely on scalar metrics and static plots to describe and evaluate machine learning models, but these methods…
-
Compare Object Detection Models From TorchVision
Introduction Object detection is one of the most popular applications of machine learning for computer vision. A detection model predicts…