May 13, 2023

Introduction We often rely on scalar metrics and static plots to describe and evaluate machine…

We face optimization problems all the time in our daily life: you don’t merely pick up a random pair of jeans and head to the checkout when you’re doing clothes shopping — hopefully not, I should say. There’s a process to it:

You may want a specific brand of jeans.

Maybe dark-wash jeans are your go-to.

Maybe you have a certain fit you prefer: straight, slim, skinny.

Most important of all, the jeans have got to fit you — is your size available?

To feel like you’ve got your money’s worth, as you sift through the in-store jeans, you’re mentally recording whether they meet the criteria for the type of jeans you want to purchase.

The best jeans for you would mark all of your criteria (i.e. Levi branded, dark wash, stretch skinny fit, and your size) — those are the optimal jeans.

**Let’s see what this optimization means in a machine learning context.**

In this article we are going to discover the following, using Comet’s experiment management platform:

→ What it means to optimize a learning algorithm

→ Comet’s `Optimizer`

class

→ Optimization approaches

→ An End-to-end example

Do you prefer to watch this tutorial? See **Hyperparameter optimization with CometML**

The hyperparameter optimization problem we face in machine learning is not too dissimilar from the one we face when out jeans shopping (or whatever we want to optimize for).

In the same way we’d search through various jeans, we need to search through various algorithms we wish to use to solve a problem at hand. Once we feel as though we’re onto something with a certain algorithm, it’s important to optimize the algorithm, which means **we minimize the error to ensure the model is solving the problem to the best of its abilities** — we’re “getting our money’s worth!”

Hyperparameters are values that control the learning process of an algorithm. When implementing a learning algorithm, we define these values beforehand, as there is no way for the algorithm to learn them from training. Examples of hyperparameters include:

- the number of trees in a random forest
- the learning rate of an algorithm
- the number of layers in a neural network

Implementing the optimal version of an algorithm means selecting the hyperparameters that minimize the error for the problem at hand —or put another way, we are trying to maximize the performance of our algorithm for the dataset being used.

When you begin on a problem, there is no clear way to know what hyperparameters will result in the optimal model. To find them we must do **hyperparameter optimization**.

Comet is a machine learning platform that permits data scientists and teams to track, monitor, compare, explain, and optimize experiments as well as models. The optimizer will be our main focus in this article.

In line with Comet’s documentation, the `Optimizer`

class may be used to:

*“dynamically find the best set of hyperparameter values that will minimize or maximize a particular metric.”*

The class is also capable of making suggestions as to what hyperparameter values may be worth trying next — which is done in serial, parallel, or a combination of both.

Arguments used to define the `Optimizer`

include:

→ **config**: *optional*, if `COMET_OPTIMIZER_ID`

is configured, otherwise is either a config dictionary, optimizer id, or a config filename.

→ **trials**: *int*(optional, default 1) number of trials per parameter set to test

→ **verbose**: *boolean* (optional, default 1) verbosity level where 0 means no output, and 1 (or greater) means to show more detail.

→ **experiment_class**: *string *or callable (optional, default None), class to use (for example, OfflineExperiment).

Notice there’s an option to pass a configuration dictionary to the **config** parameter. This is where we detail the optimization approach we want the `Optimizer`

to perform.

The dictionary the config parameter wants us to pass consists of the following keys:

→ **algorithm**: *string*, the search algorithm to be used

→ **spec**: *dictionary*, the algorithm-specific specifications.

→ **parameters**: *dictionary*, the parameter distribution space descriptions

→ **name**: *string*, a distinct name to associate with the search instance (optional)

→ **trials**: *integer*, the number of trials per experiment to run (optional, defaults to 1).

Note: We will cover the various algorithms and their`spec`

in theOptimization methodssection.

The parameters dictionary is where we define what hyperparameters to tune in our model. Let’s take a closer look at what it consists of.

`Parameters`

dictionaryIn our configuration dictionary, we have a `parameters`

key, which takes a dictionary. In our `parameters`

dictionary, we must define specific data types that are in accord with the data type our model hyperparameter is expecting.

Comet provides us with four types: 1) integer 2) double or float 3) discrete (for a list of numbers) 4) categorical (for a list of strings). The formatting of each parameter is inspired by Google’s Vizier. Let’s dive deeper into each one.

Integers and double/float types allow us to determine the scaling type of our values. Comet provides five possible distributions to select from: linear, uniform, normal, log uniform, log normal.

The scaling type we use determines the distribution between the min and max values of the hyperparameter.

```
{"PARAMETER-NAME":
{"type": "integer",
"scalingType": "linear" | "uniform" | "normal" | "loguniform" | "lognormal",
"min": INTEGER,
"max": INTEGER
},
....
}
```

For clarity, the definitions are as follows:

→ **linear**: for integers, this means an independent distribution (used for things like seed values); for double, the same as uniform

→ **uniform**: a uniform distribution between “min” and “max”

→ **normal**: a normal distribution centered on “mu”, with a standard deviation of “sigma”

→ **lognormal**: a log-normal distribution centered on “mu”, with a standard deviation of “sigma”

→ **loguniform**: a log-uniform distribution between “min” and “max”. Computes `exp(uniform(log(min), log(max)))`

For categorical hyperparameters, the possible values are a list of strings.

For discrete hyperparameters, the possible values are a list of integers.

```
{PARAMETER-NAME:
{"type": "categorical",
{"values": ["this", "is", "a", "list"]
},
...,
}
```

Now we can move on to the `algorithm`

parameter and see what goes into the `spec`

dictionary.

**More from the Comet Report Library: A guide to using an iterative strategy for hyperparameter optimization.**

Comet’s `Optimizer`

focuses on three popular algorithms you could use for hyperparameter optimization. Let’s dive deeper into each approach:

Comet documentation states “*the Bayes algorithm may be the best choice for most of your Optimizer uses.*”

“Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.” —

Wikipedia

Bayes optimization works by iteratively evaluating a promising hyperparameter configuration based on the current model, then updating it. The main aim of the technique is to gather observations that reveal as much information as possible about the location of the optimum.

To define the Bayes algorithm in Comet, we simply set the algorithm key to `"bayes”`

. As mentioned earlier, each algorithm can be given a `spec`

. For the Bayes algorithm, the `spec`

parameters include:

→ **maxCombo**: *integer*, the limit of parameter combinations to try (default 0, meaning to use 10 times the number of hyperparameters)

→ **objective**: *string*, “minimize” or “maximize”, for the objective metric (default “minimize”)

→ **metric**: *string*, the metric name that you are logging and want to minimize/maximize (default “loss”)

→ **minSampleSize**: *integer*, the number of samples to help find appropriate grid ranges (default 100)

→ **retryLimit**: *integer*, the limit to try creating a unique parameter set before giving up (default 20)

→ **retryAssignLimit**: *integer*, the limit to re-assign non-completed experiments (default 0)

```
{"algorithm": "bayes",
"spec": {
"maxCombo": 0,
"objective": "minimize",
"metric": "loss",
"minSampleSize": 100,
"retryLimit": 20,
"retryAssignLimit": 0,
},
"trials": 1,
"parameters": {...},
"name": "My Optimizer Name",
}
```

Grid search is another popular hyperparameter optimization method. It is useful for performing a wide, initial search of a set of parameter values.

The algorithm works by exhaustively searching through a manual subset of specific values in the hyperparameter space of an algorithm. Comet’s grid algorithm is slightly more flexible than many, as each time you run it, you will sample from the set of possible grids defined by the parameter space distribution. Unlike Bayes optimization, grid search does not use past experiments to inform future experiments.

The following options can be configured in the `spec`

when you opt to use `grid`

search:

→ **randomize**: *boolean*, if True, then the grid is traversed randomly; otherwise it’s traversed in order (default False)

→ **maxCombo**: *integer*, the limit of parameter combinations to try (default 0, meaning to use 10 times the number of hyperparameters)

→ **metric**: *string*, the metric name that you are logging and want to minimize/maximize (default “loss”)

→ **gridSize**: *integer*, when creating a grid, the number of bins per parameter (default 10)

→ **minSampleSize**: *integer*, the number of samples to help find appropriate grid ranges (default 100)

→ **retryLimit**: *integer*, the limit to try creating a unique parameter set before giving up (default” 20)

→ **retryAssignLimit**: *integer*, the limit to re-assign non-completed experiments (default 0)

```
{"algorithm": "grid",
"spec": {
"randomize": True,
"maxCombo": 0,
"metric": "loss",
"gridSize": 10,
"minSampleSize": 100,
"retryLimit": 20,
"retryAssignLimit": 0,
},
"trials": 1,
"parameters": {...},
"name": "My Optimizer Name",
}
```

Random search offers slightly more flexibility than grid search. Instead of exhaustively iterating through all possible combinations like in the grid search algorithm, random search selects combinations at random from the possible parameter values until the run is explicitly stopped or the max combinations are met.

Similar to grid search, the random algorithm does not use past experiments to inform future experiments, but **when only a small number of hyperparameters have an effect on the final model performance, the random search can outperform grid search.**

The “random” search algorithm uses the following options for its `spec`

:

→ **maxCombo**: *integer*, the limit of parameter combinations to try (default 0, meaning to use 10 times the number of hyperparameters)

→ **metric**: *string*, the metric name that you are logging and want to minimize/maximize (default “loss”)

→ **gridSize**: *integer*, when creating a grid, the number of bins per parameter (default 10)

→ **minSampleSize**: *integer*, the number of samples to help find appropriate grid ranges (default 100)

→ **retryLimit**: *integer*, the limit to try creating a unique parameter set before giving up (default” 20)

→ **retryAssignLimit**: *integer*, the limit to re-assign non-completed experiments (default 0)

```
{"algorithm": "random",
"spec": {
"maxCombo": 100,
"metric": "loss",
"gridSize": 10,
"minSampleSize": 100,
"retryLimit": 20,
"retryAssignLimit": 0,
},
"trials": 1,
"parameters": {...},
"name": "My Optimizer Name",
}
```

Now, let’s see an end-to-end example.

We will be using a generated binary classification problem from the `make_classification`

function in scikit-learn datasets. The data will consist of 5000 samples and 20 features, of which 3 are informative.

We then split this into train and test sets so we have a way of evaluating the performance of our model on unseen instances.

import comet_ml import matplotlib.pyplot as plt from sklearn.metrics import accuracy_score from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # create a dataset X, y = make_classification(n_samples=5000, n_informative=3, random_state=25) # split into train and test X_train, X_test, y_train, y_test = train_test_split(X,y,shuffle=True,test_size=0.25,random_state=25) # visualize data plt.subplots(figsize=(8, 5)) plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.Spectral) plt.show()

A plot of the data we want to classify

For this example, we will be using the bayes algorithm. To do this, our algorithm key is set to `"bayes"`

in the configuration dictionary as follows:

# defining the configuration dictionary config_dict = {"algorithm": "bayes", "spec": spec, "parameters": model_params, "name": "Bayes Optimization", "trials": 1}

The `spec`

defines the specifications of the bayes algorithm. We are going to test 20 different combinations to see which combination can minimize the loss of our model.

# setting the spec for bayes algorithm spec = {"maxCombo": 20, "objective": "minimize", "metric": "loss", "minSampleSize": 500, "retryLimit": 20, "retryAssignLimit": 0}

To train our model, we will be using a Random Forest classifier. There are several hyperparameters that we could tune, but for this example, we will only be tuning the number of estimators used to build the forest, the criterion to measure the quality of the split and the minimum number of samples required to be at a leaf node.

# setting the parameters we are tuning model_params = {"n_estimators": { "type": "integer", "scaling_type": "uniform", "min": 100, "max": 300}, "criterion": { "type": "categorical", "values": ["gini", "entropy"]}, "min_samples_leaf": { "type": "discrete", "values": [1, 3, 5, 7, 9]} }

Next, we initialize the `Optimizer`

. To access your Comet dashboard, you’ll need to provide an `api_key`

, which can be accessed through your Comet profile and account settings. You then assign the `config_dict`

variable to the config parameter. I’ve also provided a `project_name`

and `workspace`

so the experiments are saved to a project I created in my dashboard.

# initializing the comet ml optimizer opt = comet_ml.Optimizer(api_key="yaUBuGWQQel4gQ5TaNCWbYXal", config=config_dict, project_name="testing-hyperparameter-approaches", workspace="kurtispykes")

Note: Never share API keys. Comet allows you to set your API key as a config variable. You can learn more about using this method here.

To begin, we loop through the experiments with the `get_experiments()`

method. For each experiment, we define a Random Forest instance and use the `get_parameter()`

method to get the parameter for the experiment being run.

We then train the model and make predictions on the test set. To demonstrate more of Comet’s functionality, I’ve saved the `random_state`

value, the `accuracy`

of the model on the test data, and a confusion matrix to get a better understanding of how the model performed.

Once the run is completed, we end the experiment and begin the next one until we’ve reached the `maxCombo`

.

for experiment in opt.get_experiments(): # initializing random forest # setting the parameters to be optimized with get_parameter random_forest=RandomForestClassifier( n_estimators=experiment.get_parameter("n_estimators"), criterion=experiment.get_parameter("criterion"), min_samples_leaf=experiment.get_parameter("min_samples_leaf"), random_state=25) # training the model and making predictions random_forest.fit(X_train, y_train) y_hat = random_forest.predict(X_test) # logging the random state and accuracy of each model experiment.log_parameter("random_state", 25) experiment.log_metric("accuracy", accuracy_score(y_test, y_hat)) experiment.log_confusion_matrix(y_test, y_hat) experiment.end()

And that’s all.

We can view the experiments that were conducted from our dashboard:

experiment dashboard

By selecting an experiment, we can view different charts, code, hyperparameters, metrics, etc. This makes it easy to reproduce an experiment at any time in the future.