-
Perplexity for LLM Evaluation
Perplexity is, historically speaking, one of the “standard” evaluation metrics for language models. And while recent years have seen a…
-
OpenAI Evals: Log Datasets & Evaluate LLM Performance with Opik
OpenAI’s Python API is quickly becoming one of the most-downloaded Python packages. With an easy-to-use SDK and access…
-
Meet Opik: Your New Tool to Evaluate, Test, and Monitor LLM Applications
Today, we’re thrilled to introduce Opik – an open-source, end-to-end LLM development platform that provides the observability tools you need…
-
Building a Low-Cost Local LLM Server to Run 70 Billion Parameter Models
A guest post from Fabrício Ceolin, DevOps Engineer at Comet. Inspired by the growing demand for large-scale language models, Fabrício…
-
The Ultimate Prompt Monitoring Pipeline
Welcome to Lesson 10 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn how…
-
Beyond Proof of Concept: Building RAG Systems That Scale
Welcome to Lesson 9 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn how to use…
-
The Engineer’s Framework for LLM & RAG Evaluation
Welcome to Lesson 8 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn how to use…
-
Turning Raw Data Into Fine-Tuning Datasets
Welcome to Lesson 6 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn how to use…
-
The 4 Advanced RAG Algorithms You Must Know to Implement
Welcome to Lesson 5 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn how to use LLMs,…
-
SOTA Python Streaming Pipelines for Fine-tuning LLMs and RAG – in Real-Time!
Welcome to Lesson 4 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn…