Experiment objects
The Comet Python SDK provides several objects for performing operations on Experiments. This page describes the difference between them and presents the typical use cases where they are best used.
Experiment¶
Experiment is the core class of Comet. An Experiment represents a unit of measurable research that defines a single execution of code with some associated data for example training a model on a single set of hyperparameters. Use Experiment
to log new data to the Comet UI.
An Experiment
automatically logs scripts output (stdout/stderr), code, and command-line arguments on any script and, for the supported libraries, also logs hyperparameters, metrics, and model configuration.
Add the following snippet to the top of your training script to quickly add Experiment
functionality:
from comet_ml import Experiment
experiment = Experiment("YOUR-PERSONAL-API-KEY")
# Your code
Note
If we detect a loss of connectivity during a training run, we will save a local backup of data logged to an experiment to an OfflineExperiment. You can also save a local backup of your data for all experiments by setting up the configuration value keep_offline_zip
as described here.
Automatic logging¶
By adding Experiment
to your script, you automatically turn on logging for the following items:
- Script code and file name, or Jupyter Notebook history
- Git metadata and patch
- Model graph representation (see below)
- Model weights and biases (see below)
- Model hyperparameters (see below)
- Training metrics (see below)
- Command-line arguments to script
- Console and Jupyter Notebook standard output and error
- Environment GPU, CPU, host name, and more
Framework-dependent automatic logging¶
Framework | Logged items |
---|---|
fast.ai | All PyTorch items, plus epochs, and metrics. See examples. |
Keras | Graph description, steps, metrics, hyperparameters, weights and biases as histograms, optimizer config, and number of trainable parameters. See examples. |
MLflow | Hyperparameters, assets, models, plus lower-level framework items (for example, TensorFlow's metrics, TensorBoard summaries). |
Prophet | Hyperparameters, model, and figures |
PyTorch Lightning | Loss and accuracy, with refinements in progress. See examples. |
PyTorch | Graph description, steps, and loss. See examples. |
PyTorch + apex | Loss. See examples. |
Ray Train | Distributed system metrics. See examples. |
Scikit-learn | Hyperparameters. See examples. |
TensorBoard | Summary scalars (as metrics) and summary histograms |
TensorFlow model analysis | Time series, plots, and slicing metrics |
TensorFlow | Steps and metrics. See examples. |
XGBoost | Metrics, hyperparameters. See examples. |
ExistingExperiment¶
ExistingExperiment should be used in cases where you want to append data to an Experiment that has already been created.
The following are the typical use cases for this object:
- You want to resume training a model after some sort of interruption (for example, an internet outage).
- Your experimentation is split across multiple processes and you would like to consolidate the data to a single experiment.
- You want to append additional data to a previously completed Experiment. This is useful when your training and testing experiments are in different scripts. For example, add test set predictions as an asset or test set metrics to the Experiment.
Caveats
Certain automatic logging features are automatically disabled in ExistingExperiment
. You must manually enable them by setting them to True in the object. The following is the list of disabled features:
log_code | log_graph | parse_args |
log_env_details | log_git_metadata | log_git_patch |
log_env_gpu | log_env_cpu | log_env_host |
OfflineExperiment¶
OfflineExperiment has all the same methods and properties as the Experiment
object.
The following are the typical use cases for this object:
- You might have to run your experiments on machines that do not have a public internet connection.
- In cases your experimentation generates a high volume of data in a short period of time and you are using the
Experiment
object, there is a risk of throttling. Saving your data to anOfflineExperiment
object lets you bulk upload the data once the run is complete and avoid Comet’s throttling.
ExistingOfflineExperiment¶
ExistingOfflineExperiment has all the same methods and properties as the ExistingExperiment
object.
The typical use cases for this object:
- You have an Experiment that exists in Comet but you have an unreliable or unavailable internet connection.
- You can use
ExistingOfflineExperiment
to log your experimentation data locally and upload it to the existing Comet Experiment once you have access to a reliable internet connection.
API¶
The API object is a wrapper around Comet’s REST API. It contains a set of convenience method for fetching data logged to Comet. This object is useful when you want to fetch or analyze your data outside of the Comet UI.
APIExperiment¶
APIExperiment is a wrapper object for the data returned from the Comet Python API object. Each APIExperiment
object contains data for a single experiment that has been logged to Comet. This object has a set of convenience methods that make is easy to access and download the logged data present inside each experiment.
The typical use cases for this object:
- You would like to fetch CPU or GPU metrics across all projects in your workspace and calculate summary metrics to assess how much compute the ML team is using. Currently, there is no way to do this through the UI. You would use the Python API object to pull this data from your workspace and compute these metrics in a custom script.
- You would like to fetch all the models binaries logged to a particular project to create an ensemble model.
- You want to calculate the average model accuracy across multiple projects working on the same task.