Run open source LLM evaluations with Opik!

Star
Comet logo
  • Opik LLM Evals
  • Products
    • Opik – LLM Evaluation
    • ML Experiment Management
    • ML Artifacts
    • ML Model Registry
    • ML Model Production Monitoring
  • Docs
    • Opik – LLM Evaluation
    • ML Experiment Management
  • Pricing
  • Customers
  • Learn
    • Blog
    • Deep Learning Weekly
    • LLM Course
  • Company
    • About Us
    • News and Events
      • Events
      • Press Releases
    • Careers
    • Contact Us
    • Leadership
  • Login
Get Demo
Try Comet Free

Where AI Developers Build

Achieve consistency and predictability in your AI applications and agentic systems at scale with Comet’s end-to-end model evaluation platform.

Open Source LLM Evaluation
Automated Agent Optimization
GenAI Guardrails
ML Experiment Tracking
Try Comet Free
Book a Demo

Trusted by the most innovative ML teams

AssemblyAI logo
Natwest logo
Uber Logo
Netflix Logo
Etsy logo
Mobileye logo
AssemblyAI logo
Natwest logo
Uber Logo
Netflix Logo
Etsy logo
Mobileye logo

Ship complex LLM apps and agents with Opik

Automate prompt engineering and turn LLM tuning into a repeatable, scalable process, with tracing, LLM eval metrics, and application-level unit testing to ensure performance across development and production.

Opik – Open-Source LLM Evaluation

Track and visualize your model training runs with Experiment Management

Log all your machine learning iteration to a single system of record. Make it easy to reproduce a previous experiment and compare the performances of training runs.

ML Experiment Management

Monitor ML model performance in production with Comet MPM

Track data drift on your input and output features after your model is deployed to production. Set customized alerts to capture model performance degradation in real time.

ML Model Production Monitoring

Store and manage your models with Model Registry

Create a centralized repository of all your model versions with immediate access to how they were trained. Promote models to downstream production systems with webhooks.

ML Model Registry

Create and version datasets with Artifacts

Know which exact dataset version a model was trained on for auditing and governance purposes. Leverage remote pointers to reference data already stored in the cloud. 

ML Dataset Versioning

Easy Integration

Add just a few lines of code to your notebook or script and automatically start tracking LLM traces, code, hyperparameters, metrics, model predictions, and more.

Try Comet Free
Try a Live Notebook
Opik LLM Evaluation
from opik import track

@track
def llm_chain(user_question):
    context = get_context(user_question)
    response = call_llm(user_question, context)
    
    return response

@track
def get_context(user_question):
    # Logic that fetches the context, hard coded here
    return ["The dog chased the cat.", "The cat was called Luky."]

@track
def call_llm(user_question, context):
    # LLM call, can be combined with any Opik integration
    return "The dog chased the cat Luky."

response = llm_chain("What did the dog do ?")
print(response)
from llama_index.core import VectorStoreIndex, global_handler, set_global_handler
from llama_index.core.schema import TextNode

# Configure the Opik integration
set_global_handler("opik")
opik_callback_handler = global_handler


node1 = TextNode(text="The cat sat on the mat.", id_="1")
node2 = TextNode(text="The dog chased the cat.", id_="2")

index = VectorStoreIndex([node1, node2])

# Create a LlamaIndex query engine
query_engine = index.as_query_engine()

# Query the documents
response = query_engine.query("What did the dog do ?")
print(response)
from langchain_openai import ChatOpenAI
from opik.integrations.langchain import OpikTracer

# Initialize the tracer
opik_tracer = OpikTracer()

# Create the LLM Chain using LangChain
llm = ChatOpenAI(temperature=0)

# Configure the Opik integration
llm = llm.with_config({"callbacks": [opik_tracer]})

llm.invoke("Hello, how are you?")
from openai import OpenAI
from opik.integrations.openai import track_openai

openai_client = OpenAI()
openai_client = track_openai(openai_client)

response = openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)
Experiment Management
from comet_ml import Experiment
import torch.nn as nn

# 1. Define a new experiment 
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Create your model class 
class RNN(nn.Module):
    #... Define your Class 

# 3. Train and test your model while logging everything to Comet
with experiment.train():
    # ...Train your model and log metrics 
    experiment.log_metric("accuracy", correct / total, step = step)

# 4. View real-time metrics in Comet
from pytorch_lightning.loggers import CometLogger

# 1. Create your Model

# 2. Initialize CometLogger
comet_logger = CometLogger()

# 3. Train your model 
trainer = pl.Trainer(
    logger=[comet_logger],
    # ...configs
)

trainer.fit(model)

# 4. View real-time metrics in Comet
from comet_ml import Experiment
from transformers import Trainer

# 1. Define a new experiment 
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Train your model 
trainer = Trainer(
    model = model,
    # ...configs
)

trainer.train()

# 3. View real-time metrics in Comet
from comet_ml import Experiment
from tensorflow import keras

# 1. Define a new experiment 
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Define your model
model = tf.keras.Model(
    # ...configs
)

# 3. Train your model
model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
)

# 4. Track real-time metrics in Comet
from comet_ml import Experiment
import tensorflow as tf

# 1. Define a new experiment 
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Define and train your model
model.fit(...)

# 3. Log additional model metrics and params
experiment.log_parameters({'custom_params': True})
experiment.log_metric('custom_metric', 0.95)

# 4. Track real-time metrics in Comet
from comet_ml import Experiment
import tree from sklearn

# 1. Define a new experiment 
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Build your model and fit
clf = tree.DecisionTreeClassifier(
    # ...configs
)

clf.fit(X_train_scaled, y_train)
params = {...}
metrics = {...}

# 3. Log additional metrics and params
experiment.log_parameters(params)
experiment.log_metrics(metrics)

# 4. Track model performance in Comet
from comet_ml import Experiment
import xgboost as xgb

# 1. Define a new experiment
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Define your model and fit
xg_reg = xgb.XGBRegressor(
    # ...configs
)
xg_reg.fit(
    X_train,
    y_train,
    eval_set=[(X_train, y_train), (X_test, y_test)],
    eval_metric="rmse",
)

# 3. Track model performance in Comet
# Utilize Comet in any environment
from comet_ml import Experiment

# 1. Define a new experiment
experiment = Experiment(project_name="YOUR PROJECT")

# 2. Model training here

# 3. Log metrics or params over time
experiment.log_metrics(metrics)

#4. Track real-time metrics in Comet
Model Management
# Utilize Comet in any environment
from comet_mpm import CometMPM

# 1. Create the MPM logger
MPM = CometMPM()

# 2. Add your inference logic here

# 3. Log metrics or params over time
MPM.log_event(
        prediction_id="...",
        input_features=input_features,
        output_value=prediction,
        output_probability=probability,
    )

An End-to-End Model Evaluation Platform

Comet’s end-to-end model evaluation platform for developers focuses on shipping AI features, including open source LLM tracing, ML unit-testing, evaluations, experiment tracking and production monitoring.

Track and compare your training runs, log and evaluate your LLM responses, version your models and training data, and monitor your models in production – all in one platform.

Where AI Developers Build

Run Comet’s end-to-end evaluation platform on any infrastructure to see firsthand how Comet’s reshapes your workflow. Bring your existing software and data stack. Use code panels to create visualizations in your preferred user interfaces.

Infrastructure

An AI Platform Built for Enterprise, Driven by Community

Comet’s end-to-end evaluation platform is trusted by innovative data scientists, ML practitioners, and engineers in the most demanding enterprise environments.

“Comet has aided our success with ML and serves to further ML development within Zappos.”

KYLE ANDERSON

DIRECTOR OF SOFTWARE ENGINEERING

“Comet offers the most complete experiment tracking solution on the market. It’s brought significant value to our business.”

Olcay Cirit

Staff Research and Tech lead

“Comet enables us to speed up research cycles and reliably reproduce and collaborate on our modeling projects. It has become an indispensable part of our ML workflow.”

Victor Sanh

Machine Learning Scientist

“None of the other products have the simplicity, ease of use and feature set that Comet has.”

Ronny Huang

Research Scientist

“After discovering Comet, our deep learning team’s productivity went up. Comet is easy to set up and allows us to move research faster.”

Guru Rao

Head of AI

“We can seamlessly compare and share experiments, debug and stop underperforming models. Comet has improved our efficiency.”

Carol Anderson

Staff Data Scientist

Get started today, free.

No credit card required, try Comet with no risk and no commitment.

Book a Demo
Contact Sales
Comet logo
  • LinkedIn
  • X
  • YouTube
  • Facebook

Subscribe to Comet

Thank you for subscribing to Comet’s newsletter!

Products

  • Opik
  • Experiment Management
  • Artifacts
  • Model Registry
  • Model Production Monitoring

Learn

  • Documentation
  • Resources
  • Comet Blog
  • Deep Learning Weekly
  • Heartbeat
  • LLM Course

Company

  • About Us
  • News and Events
  • Careers
  • Contact Us

Pricing

  • Pricing
  • Create a Free Account
  • Contact Sales
Capterra badge
AICPA badge

©2025 Comet ML, Inc. – All Rights Reserved

Terms of Service

Privacy Policy

CCPA Privacy Notice

Cookie Settings

We use cookies to collect statistical usage information about our website and its visitors and ensure we give you the best experience on our website. Please refer to our Privacy Policy to learn more.OkPrivacy policy