-
SelfCheckGPT for LLM Evaluation
Detecting hallucinations in language models is challenging. There are three general approaches: The problem with many LLM-as-a-Judge techniques is that…
-
Perplexity for LLM Evaluation
Perplexity is, historically speaking, one of the “standard” evaluation metrics for language models. And while recent years have seen a…