Skip to content

Integrate with MLflow

Comet has extensive support for users of MLflow.

Comet can support the use of MLflow in two different methods:

  • Built-in, core Comet support for MLflow
  • Comet for MLflow extension.

The following sections provide details of both methods.

Built-in, core Comet support for MLflow

If you're already using MLflow, then Comet will work with MLflow with no further configuration.

Run any MLflow script from the console, as follows:

comet python mlflow_script.py

Alternatively, you can add this one line of code to the top of your training MLflow script and run your MLflow script as you normally would.:

import comet_ml

Open In Colab

How it works

Comet's built-in, core support for MLflow attempts to create a live, online Experiment if a Comet API Key is configured. If a Comet API Key cannot be found, you will see the following log message:

No Comet API Key was found, creating an OfflineExperiment.
Set up your API Key to get the full Comet experience:
https://www.comet.com/docs/api-and-sdk/python-sdk/advanced/configuration/.

In case no API key is found, the Comet SDK still creates an OfflineExperiment so you still get all the additional tracking data from Comet. Just remember to upload the offline experiment archive later. At the end of the run, the script provides you the exact command to run, similar to the following:

comet upload /path/to/archive.zip

Any future Experiment runs that are created with this script automatically include Comet's extended Experiment tracking to MLflow.

Log automatically

When you run MLflow by importing comet_ml or by using the command-line comet python script.py, you automatically log all of the following items to Comet's single experiment page:

  • Metrics: Logged to the metrics tab
  • Hyperparameters: Logged to the hyperparameters tab
  • Models: Logged to the assets tab
  • Assets: Logged to the assets tab
  • Source code: Logged to the code tab
  • Git repo and patch info: Available by clicking the reproduce button
  • System metrics
  • CPU and GPU usage: Logged to the system metrics tab
  • Python packages: Logged to the installed packages tab
  • Command-line arguments
  • Standard Output: Logged to the output tab
  • Installed OS Packages: Available through the get_os_packages method

For more information about using environment parameters in Comet, see Configure Comet.

Info

For more information on using Comet in the console, see Comet Command-Line Utilities.

Now, explore the other support method for MLflow users.

If you would like to see your previously run MLflow Experiments in Comet, try the comet_for_mlflow extension. To do this, first download the open-source Python extension and command-line interface (CLI) command:

pip install comet-for-mlflow
Then execute this command at the command line:
comet_for_mlflow

The Comet for MLflow Extension finds any existing MLflow runs in your current folder and make those available for analysis in Comet. For more options, use comet_for_mlflow --help and see the following section.

The Comet for MLflow Extension is an open-source project and can be found at: github.com/comet-ml/comet-for-mlflow/

We welcome any questions, bug fixes, and comments in that Git repo.

Advanced CLI usage for Comet for MLflow Extension

The comet_for_mlflow command offers several options to help you get the most out of previous MLflow runs with Comet:

  • --upload - automatically uploads the prepared Experiments to Comet.
  • --no-upload - do not upload the prepared Experiments to Comet.
  • --api-key API_KEY - set the Comet API key.
  • --mlflow-store-uri MLFLOW_STORE_URI - set the MLflow store URI.
  • --output-dir OUTPUT_DIR - set the directory to store prepared runs.
  • --force-reupload - force re-upload of prepared Experiments.
  • -y, --yes - answer all yes/no questions automatically with 'yes'.
  • -n, --no - answer all yes/no questions automatically with 'no'.
  • --email EMAIL - set email address, if needed, for creating an account.

For more information, use comet_for_mlflow --help or see github.com/comet-ml/comet-for-mlflow.

Configure Comet for MLflow

Calling mlflow.start_run() in your code will create an Experiment object. The auto-logging features of this Experiment object can be configured through either environment variables or the .comet.config file.

ItemExperiment ParameterEnvironment SettingConfiguration Setting
metricsauto_metric_loggingCOMET_AUTO_LOG_METRICScomet.auto_log.metrics
metric logging rateauto_metric_step_rateCOMET_AUTO_LOG_METRIC_STEP_RATEcomet.auto_log.metric_step_rate
hyperparametersauto_param_loggingCOMET_AUTO_LOG_PARAMETERScomet.auto_log.parameters
command-line argumentsparse_argsCOMET_AUTO_LOG_CLI_ARGUMENTScomet.auto_log.cli_arguments

As mentioned, Comet supports MLflow users through two different approaches:

  1. Built-in, core Comet support for MLflow
  2. Comet for MLflow Extension

The first is useful for running new Experiments, and requires you to use import comet_ml or comet python script.py. The second is useful for previously run MLflow Experiments and requires the comet-for-mlflow extension.

There are some differences in the way these two methods operate. Specifically:

Item logged?Comet built-in, core supportComet Extension for MLflow
MetricsYesYes
HyperparametersYesYes
ModelsYesYes
AssetsYesYes
Source codeYesNo
git repo and patch infoYesNo
System metricsYesNo
CPU and GPU usageYesNo
Python packagesYesNo
Command-line argumentsYesNo
Standard outputYesNo
Installed OS packagesYesNo

Limitations in Comet support for MLflow

When running the built-in, core Comet support, there are two limitations:

  • It does not support MLflow nested runs.
  • It does not support continuing a previous MLflow run. The MLflow extension creates a new Comet Experiment in this case.

End-to-end example

import comet_ml
import keras

# The following import and function call are the only additions to code required
# to automatically log metrics and parameters to MLflow.
import mlflow
import mlflow.keras

import numpy as np
from keras.datasets import reuters
from keras.layers import Activation, Dense, Dropout
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer

# The sqlite store is needed for the model registry
mlflow.set_tracking_uri("sqlite:///db.sqlite")

# We need to create a run before calling keras or MLflow will end the run by itself
mlflow.start_run()

mlflow.keras.autolog()

max_words = 1000
batch_size = 32
epochs = 5

print("Loading data...")
(x_train, y_train), (x_test, y_test) = reuters.load_data(
    num_words=max_words, test_split=0.2
)

print(len(x_train), "train sequences")
print(len(x_test), "test sequences")

num_classes = np.max(y_train) + 1
print(num_classes, "classes")

print("Vectorizing sequence data...")
tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode="binary")
x_test = tokenizer.sequences_to_matrix(x_test, mode="binary")
print("x_train shape:", x_train.shape)
print("x_test shape:", x_test.shape)

print(
    "Convert class vector to binary class matrix "
    "(for use with categorical_crossentropy)"
)
y_train = keras.utils.np_utils.to_categorical(y_train, num_classes)
y_test = keras.utils.np_utils.to_categorical(y_test, num_classes)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)

print("Building model...")
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_split=0.1,
)
score = model.evaluate(x_test, y_test, batch_size=batch_size, verbose=1)
print("Test score:", score[0])
print("Test accuracy:", score[1])

mlflow.keras.log_model(model, "model", registered_model_name="Test Model")
mlflow.end_run()

Try it out

Here is an example Colab Notebook for using Comet with MLflow.

Open In Colab

Nov. 29, 2022