Run hyperparameter tuning in parallel¶

When tuning hyper-parameters, you may want to speed up the time it takes to run the search by parallelizing the tuning runs.

Tip

While random and grid search can be run fully in parallel, the bayes search algorithm requires visibility on previous tuning runs for the smart selection of the next hyperparameters to tune so you should be careful to not fully parallelize execution for Bayesian tuning.

How to parallelize Comet Optimizer¶

Comet allows you to parallelize the execution of the Comet Optimizer using the comet optimize CLI command.

In order to execute Comet Optimizer through command line, you first need to:

Format your training code inside an executable Python script.
Move the Optimizer config file to a separate file called optimizer.config.
Update the training script to read the optimizer config via sys.argv.

Then, you can simply run the comet optimize command with the -j or --parallel argument followed by the number of parallel runs to perform.

For example, if you want to parallelize the hyperparameter tuning across two processes, you can use:

comet optimize -j 2 training_script.py optimizer.config

And Comet will automatically associates split the hyperparameter selections across the two parallel processes.

Note

You execute (maxCombo * trial) / j tuning runs for each parallel process where:

maxCombo and trial are defined inside the optimize.config file, and
j is specified in the command itself (e.g., 2 in the example above).

Discover more about the optimizer configuration in the Configure the Optimizer page.

End-to-end Example¶

This distributed example showcases how to execute the end-to-end example from the Comet Optimizer Quickstart in parallel.

Simply run:

comet optimize -j 2 training_script.py optimizer.config

where:

the training_script.py is defined as:

training_script.py

import comet_ml
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import sys

# Initialize the Comet SDK
comet_ml.login(project_name="example-optimizer")

# Get the optimizer config file from args
optimizer_config = sys.argv[1]

# Create a dataset
X, y = make_classification(n_samples=5000, n_informative=3, random_state=25)

# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X,y,shuffle=True,test_size=0.25,random_state=25)

# Initialize the Comet Optimizer
opt = comet_ml.Optimizer(config=optimizer_config)

# Run optimization
for experiment in opt.get_experiments():
    # Initialize the algorithm, and set the parameters to be optimized with get_parameter
    random_forest=RandomForestClassifier(
        n_estimators=experiment.get_parameter("n_estimators"),
        criterion=experiment.get_parameter("criterion"),
        min_samples_leaf=experiment.get_parameter("min_samples_leaf"),
        random_state=25,
    )

    # Train the model and make predictions
    random_forest.fit(X_train, y_train)
    y_hat = random_forest.predict(X_test)

    # Log the random state and accuracy of each model
    experiment.log_parameter("random_state", 25)
    experiment.log_metric("accuracy", accuracy_score(y_test, y_hat))
    experiment.log_confusion_matrix(y_test, y_hat)

    # End the current experiment
    experiment.end()

and the optimizer.config is defined as:

optimizer.config

{
    "algorithm": "bayes",
    "spec": {
        "maxCombo": 20,
        "objective": "minimize",
        "metric": "loss",
        "minSampleSize": 500,
        "retryLimit": 20,
        "retryAssignLimit": 0,
    },
    "parameters": {
        "n_estimators": {
            "type": "integer",
            "scaling_type": "uniform",
            "min": 100,
            "max": 300
        },
        "criterion": {
            "type": "categorical",
            "values": ["gini", "entropy"]
        },
        "min_samples_leaf": {
            "type": "discrete",
            "values": [1, 3, 5, 7, 9]
        },
    },
    "name": "Bayes Optimization",
    "trials": 10,
}

Jul. 25, 2024