SelfCheckGPT for LLM Evaluation
Detecting hallucinations in language models is challenging. There are three general approaches: Measuring token-level probability distributions for indications that a…
Detecting hallucinations in language models is challenging. There are three general approaches: Measuring token-level probability distributions for indications that a…
Perplexity is, historically speaking, one of the "standard" evaluation metrics for language models. And while recent years have seen a…