Integrate with MLflow¶

Comet has extensive support for users of MLflow.

Comet can support the use of MLflow in two different methods:

Built-in, core Comet support for MLflow
Comet for MLflow extension.

The following sections provide details of both methods.

Built-in, core Comet support for MLflow¶

If you're already using MLflow, then Comet will work with MLflow with no further configuration.

Run any MLflow script from the console, as follows:

comet python mlflow_script.py

Alternatively, you can add this one line of code to the top of your training MLflow script and run your MLflow script as you normally would.:

import comet_ml

How it works¶

Comet's built-in, core support for MLflow attempts to create a live, online Experiment if a Comet API Key is configured. If a Comet API Key cannot be found, you will see the following log message:

No Comet API Key was found, creating an OfflineExperiment.
Set up your API Key to get the full Comet experience:
https://www.comet.com/docs/guides/experiment-management/configure-sdk/.

In case no API key is found, the Comet SDK still creates an OfflineExperiment so you still get all the additional tracking data from Comet. Just remember to upload the offline experiment archive later. At the end of the run, the script provides you the exact command to run, similar to the following:

comet upload /path/to/archive.zip

Any future Experiment runs that are created with this script automatically include Comet's extended Experiment tracking to MLflow.

Log automatically¶

When you run MLflow by importing comet_ml or by using the command-line comet python script.py, you automatically log all of the following items to Comet's [single experiment page]/docs/v2/guides/comet-ui/experiment-management/experiments-page/#the-single-experiment-page):

Metrics: Logged to the metrics tab
Hyperparameters: Logged to the hyperparameters tab
Models: Logged to the assets tab
Assets: Logged to the assets tab
Source code: Logged to the code tab
Git repo and patch info: Available by clicking the reproduce button
System metrics
CPU and GPU usage: Logged to the system metrics tab
Python packages: Logged to the installed packages tab
Command-line arguments
Standard Output: Logged to the output tab
Installed OS Packages: Available through the get_os_packages method

For more information about using environment parameters in Comet, see Configure Comet.

Info

For more information on using Comet in the console, see Comet Command-Line Utilities.

Now, explore the other support method for MLflow users.

If you would like to see your previously run MLflow Experiments in Comet, try the comet_for_mlflow extension. To do this, first download the open-source Python extension and command-line interface (CLI) command:

pip install comet-for-mlflow

Then execute this command at the command line:

comet_for_mlflow

The Comet for MLflow Extension finds any existing MLflow runs in your current folder and make those available for analysis in Comet. For more options, use comet_for_mlflow --help and see the following section.

The Comet for MLflow Extension is an open-source project and can be found at: github.com/comet-ml/comet-for-mlflow/

We welcome any questions, bug fixes, and comments in that Git repo.

Advanced CLI usage for Comet for MLflow Extension¶

The comet_for_mlflow command offers several options to help you get the most out of previous MLflow runs with Comet:

--upload - automatically uploads the prepared Experiments to Comet.
--no-upload - do not upload the prepared Experiments to Comet.
--api-key API_KEY - set the Comet API key.
--mlflow-store-uri MLFLOW_STORE_URI - set the MLflow store URI.
--output-dir OUTPUT_DIR - set the directory to store prepared runs.
--force-reupload - force re-upload of prepared Experiments.
-y, --yes - answer all yes/no questions automatically with 'yes'.
-n, --no - answer all yes/no questions automatically with 'no'.
--email EMAIL - set email address, if needed, for creating an account.

For more information, use comet_for_mlflow --help or see github.com/comet-ml/comet-for-mlflow.

Configure Comet for MLflow¶

Calling mlflow.start_run() in your code will create an Experiment object. The auto-logging features of this Experiment object can be configured through either environment variables or the .comet.config file.

Item	Experiment Parameter	Environment Setting	Configuration Setting
metrics	auto_metric_logging	COMET_AUTO_LOG_METRICS	comet.auto_log.metrics
metric logging rate	auto_metric_step_rate	COMET_AUTO_LOG_METRIC_STEP_RATE	comet.auto_log.metric_step_rate
hyperparameters	auto_param_logging	COMET_AUTO_LOG_PARAMETERS	comet.auto_log.parameters
command-line arguments	parse_args	COMET_AUTO_LOG_CLI_ARGUMENTS	comet.auto_log.cli_arguments

As mentioned, Comet supports MLflow users through two different approaches:

Built-in, core Comet support for MLflow
Comet for MLflow Extension

The first is useful for running new Experiments, and requires you to use import comet_ml or comet python script.py. The second is useful for previously run MLflow Experiments and requires the comet-for-mlflow extension.

There are some differences in the way these two methods operate. Specifically:

Item logged?	Comet built-in, core support	Comet Extension for MLflow
Metrics	Yes	Yes
Hyperparameters	Yes	Yes
Models	Yes	Yes
Assets	Yes	Yes
Source code	Yes	No
git repo and patch info	Yes	No
System metrics	Yes	No
CPU and GPU usage	Yes	No
Python packages	Yes	No
Command-line arguments	Yes	No
Standard output	Yes	No
Installed OS packages	Yes	No

Limitations in Comet support for MLflow¶

When running the built-in, core Comet support, there are two limitations:

It does not support MLflow nested runs.
It does not support continuing a previous MLflow run. The MLflow extension creates a new Comet Experiment in this case.

End-to-end example¶

For more examples using mlflow, see our examples GitHub repository.

import comet_ml
import keras

# The following import and function call are the only additions to code required
# to automatically log metrics and parameters to MLflow.
import mlflow
import mlflow.keras

import numpy as np
from keras.datasets import reuters
from keras.layers import Activation, Dense, Dropout
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer

# The sqlite store is needed for the model registry
mlflow.set_tracking_uri("sqlite:///db.sqlite")

# We need to create a run before calling keras or MLflow will end the run by itself
mlflow.start_run()

mlflow.keras.autolog()

max_words = 1000
batch_size = 32
epochs = 5

print("Loading data...")
(x_train, y_train), (x_test, y_test) = reuters.load_data(
    num_words=max_words, test_split=0.2
)

print(len(x_train), "train sequences")
print(len(x_test), "test sequences")

num_classes = np.max(y_train) + 1
print(num_classes, "classes")

print("Vectorizing sequence data...")
tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode="binary")
x_test = tokenizer.sequences_to_matrix(x_test, mode="binary")
print("x_train shape:", x_train.shape)
print("x_test shape:", x_test.shape)

print(
    "Convert class vector to binary class matrix "
    "(for use with categorical_crossentropy)"
)
y_train = keras.utils.np_utils.to_categorical(y_train, num_classes)
y_test = keras.utils.np_utils.to_categorical(y_test, num_classes)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)

print("Building model...")
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_split=0.1,
)
score = model.evaluate(x_test, y_test, batch_size=batch_size, verbose=1)
print("Test score:", score[0])
print("Test accuracy:", score[1])

mlflow.keras.log_model(model, "model", registered_model_name="Test Model")
mlflow.end_run()

Try it out¶

Here is an example Colab Notebook for using Comet with MLflow.

Jul. 25, 2024