Skip to content

Run hyperparameter tuning in parallel

When tuning hyper-parameters, you may want to speed up the time it takes to run the search by parallelizing the tuning runs.

Tip

While random and grid search can be run fully in parallel, the bayes search algorithm requires visibility on previous tuning runs for the smart selection of the next hyperparameters to tune so you should be careful to not fully parallelize execution for Bayesian tuning.

How to parallelize Comet Optimizer

Comet allows you to parallelize the execution of the Comet Optimizer using the comet optimize CLI command.

In order to execute Comet Optimizer through command line, you first need to:

  1. Format your training code inside an executable Python script.
  2. Move the Optimizer config file to a separate file called optimizer.config.
  3. Update the training script to read the optimizer config via sys.argv.

Then, you can simply run the comet optimize command with the -j or --parallel argument followed by the number of parallel runs to perform.

For example, if you want to parallelize the hyperparameter tuning across two processes, you can use:

comet optimize -j 2 training_script.py optimizer.config

And Comet will automatically associates split the hyperparameter selections across the two parallel processes.

Note

You execute (maxCombo * trial) / j tuning runs for each parallel process where:

  • maxCombo and trial are defined inside the optimize.config file, and
  • j is specified in the command itself (e.g., 2 in the example above).

Discover more about the optimizer configuration in the Configure the Optimizer page.

End-to-end Example

This distributed example showcases how to execute the end-to-end example from the Comet Optimizer Quickstart in parallel.

Simply run:

comet optimize -j 2 training_script.py optimizer.config

where:

  • the training_script.py is defined as:

    training_script.py
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    import comet_ml
    from sklearn.metrics import accuracy_score
    from sklearn.datasets import make_classification
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import train_test_split
    import sys
    
    # Initialize the Comet SDK
    comet_ml.init(project_name="example-optimizer")
    
    # Get the optimizer config file from args
    optimizer_config = sys.argv[1]
    
    # Create a dataset
    X, y = make_classification(n_samples=5000, n_informative=3, random_state=25)
    
    # Split dataset into train and test
    X_train, X_test, y_train, y_test = train_test_split(X,y,shuffle=True,test_size=0.25,random_state=25)
    
    # Initialize the Comet Optimizer
    opt = comet_ml.Optimizer(config=optimizer_config)
    
    # Run optimization
    for experiment in opt.get_experiments():
        # Initialize the algorithm, and set the parameters to be optimized with get_parameter
        random_forest=RandomForestClassifier(
            n_estimators=experiment.get_parameter("n_estimators"),
            criterion=experiment.get_parameter("criterion"),
            min_samples_leaf=experiment.get_parameter("min_samples_leaf"),
            random_state=25,
        )
    
        # Train the model and make predictions
        random_forest.fit(X_train, y_train)
        y_hat = random_forest.predict(X_test)
    
        # Log the random state and accuracy of each model
        experiment.log_parameter("random_state", 25)
        experiment.log_metric("accuracy", accuracy_score(y_test, y_hat))
        experiment.log_confusion_matrix(y_test, y_hat)
    
        # End the current experiment
        experiment.end()
    
  • and the optimizer.config is defined as:

    optimizer.config
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    {
        "algorithm": "bayes",
        "spec": {
            "maxCombo": 20,
            "objective": "minimize",
            "metric": "loss",
            "minSampleSize": 500,
            "retryLimit": 20,
            "retryAssignLimit": 0,
        },
        "parameters": {
            "n_estimators": {
                "type": "integer",
                "scaling_type": "uniform",
                "min": 100,
                "max": 300
            },
            "criterion": {
                "type": "categorical",
                "values": ["gini", "entropy"]
            },
            "min_samples_leaf": {
                "type": "discrete",
                "values": [1, 3, 5, 7, 9]
            },
        },
        "name": "Bayes Optimization",
        "trials": 10,
    }
    
May. 17, 2024