-
Building Opik: A Scalable Open-Source LLM Observability Platform
Opik is an open-source platform for evaluating, testing, and monitoring LLM applications, created by Comet. When teams integrate language models…
-
G-Eval for LLM Evaluation
LLM-as-a-judge evaluators have gained widespread adoption due to their flexibility, scalability, and close alignment with human judgment. They excel at…
-
Comet Product Releases January 2025
As 2025 picks up steam, we’re thrilled to bring you some exciting product updates from Comet! This month, we’ve added…
-
Build Multi-Index Advanced RAG Apps
Welcome to Lesson 12 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn…
-
Build a scalable RAG ingestion pipeline using 74.3% less code
Welcome to Lesson 11 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn…
-
LLM Evaluation Metrics Every Developer Should Know
When you build an app or system on top of an LLM, you need a way to understand the quality…
-
Intro to LLM Observability: What to Monitor & How to Get Started
While LLM usage is soaring, productionizing an LLM-powered application or software product presents new and different challenges compared to traditional…
-
BERTScore For LLM Evaluation
Introduction BERTScore represents a pivotal shift in LLM evaluation, moving beyond traditional heuristic-based metrics like BLEU and ROUGE to a…
-
Building ClaireBot, an AI Personal Stylist Chatbot
Follow the evolution of my personal AI project and discover how to integrate image analysis, LLM models, and LLM-as-a-judge evaluation…
-
Structured Generation for LLM-as-a-Judge Evaluations
For the past few months, I’ve been working on LLM-based evaluations (”LLM-as-a-Judge” metrics) for language models. The results have so…