-
SelfCheckGPT for LLM Evaluation
Detecting hallucinations in language models is challenging. There are three general approaches: The problem with many LLM-as-a-Judge techniques is that…
-
LLM Hallucination Detection in App Development
Even ChatGPT knows it’s not always right. When prompted, “Are large language models (LLMs) always accurate?” ChatGPT says no and…
-
Major Releases: TypeScript for LLM Evals, Total Fidelity ML Metrics, & More
Spring is in the air, and we’re excited to bring you four fresh releases in the Comet platform to make…
-
LLM Evaluation Frameworks: Head-to-Head Comparison
As teams work on complex AI agents and expand what LLM-powered applications can achieve, a variety of LLM evaluation frameworks…
-
LLM Juries for Evaluation
Evaluating the correctness of generated responses is an inherently challenging task. LLM-as-a-Judge evaluators have gained popularity for their ability to…
-
A Simple Recipe for LLM Observability
So, you’re building an AI application on top of an LLM, and you’re planning on setting it live in production.…
-
LLM Monitoring & Maintenance in Production Applications
Generative AI has become a transformative force, revolutionizing how businesses engage with users through chatbots, content creation, and personalized recommendations.…
-
Building Opik: A Scalable Open-Source LLM Observability Platform
Opik is an open-source platform for evaluating, testing, and monitoring LLM applications, created by Comet. When teams integrate language models…
-
G-Eval for LLM Evaluation
LLM-as-a-judge evaluators have gained widespread adoption due to their flexibility, scalability, and close alignment with human judgment. They excel at…
-
Comet Product Releases January 2025
As 2025 picks up steam, we’re thrilled to bring you some exciting product updates from Comet! This month, we’ve added…