-
The Ultimate Guide to LLM Evaluation: Metrics, Methods & Best Practices
The meteoric rise of large language models (LLMs) and their widespread use across more applications and user experiences raises an…
-
How We Used Opik to Build AI-Powered Trace Analysis
Within the GenAI development cycle, Opik does often-overlooked — yet essential — work of logging, testing, comparing, and optimizing steps…
-
AI Agent Design Patterns: How to Build Reliable AI Agent Architecture for Production
LLMs are powerful, but turning them into reliable, adaptable AI agents is a whole different game. After designing the architecture…
-
AI Assisted Coding with Cursor AI and Opik
How AI can help you move beyond vibe coding and become an effective AI engineer faster than you think Dear…
-
Release Highlights: Discover Opik Agent Optimizer, Guardrails, & New Integrations
As LLMs power more complex, multi-step agentic systems, the need for precise optimization and control is growing. In case you…
-
Announcing Opik’s Guardrails Beta: Moderate LLM Applications in Real-Time
We’ve spent the past year building tools that make LLM applications more transparent, measurable, and accountable. Since launching Opik, our…
-
From Observability to Optimization: Announcing the Opik Agent Optimizer Public Beta
At Comet, we’re driven by a commitment to advance innovation in AI, particularly in the realm of LLM observability. Our…
-
Major Releases: MCP Server & Google Agent Dev Kit Support
We’ve just rolled out two major updates in Opik, Comet’s open-source LLM evaluation platform, that make it easier than ever…
-
SelfCheckGPT for LLM Evaluation
Detecting hallucinations in language models is challenging. There are three general approaches: The problem with many LLM-as-a-Judge techniques is that…
-
LLM Hallucination Detection in App Development
Even ChatGPT knows it’s not always right. When prompted, “Are large language models (LLMs) always accurate?” ChatGPT says no and…















