Troubleshooting and FAQ

Issues with optimizers¶

The following are issues you might encounter while working with optimizers.

Continue from crashed or paused optimizer¶

If you pause your search, or if your optimizer script ever crashes, you can recover your search and pick up immediately from where you left off. You need only define the COMET_OPTIMIZER_ID in the environment and run your script again. The COMET_OPTIMIZER_ID is printed in the terminal at the start of each sweep. It is also logged with each experiment in the Other tab.

Here is an example of a script crashing, and continuing with the search:

$ python script.py

COMET INFO: COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60
... it crashes for some reason

$ edit script.py

$ export COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60

$ python script.py
COMET INFO: COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60

You can also supply the optimizer ID to the Optimizer class rather than the file name containing the optimizer config. For example, consider again example-2.py from above:

# file: example-2.py

from comet_ml import Optimizer
import sys

# Next, create an optimizer, passing in the config:
# (You can leave out API_KEY if you already set it.)
opt = Optimizer(sys.argv[1])

# define fit function here!

# Finally, get experiments, and train your models:
for experiment in opt.get_experiments(
        project_name="optimizer-search-03"):
    # Test the model
    loss = fit(experiment.get_parameter("x"))
    experiment.log_metric("loss", loss)

Recall that you can start that program up, so:

$ python example-2.py example-2.config

or using comet optimize:

$ comet optimize -j 2 example-2.py example-2.config

To use the same script and start up where you left off, you only need the Comet Optimizer ID. When you start up a new optimizer, you will see a line similar to this:

COMET INFO: COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d

You can set this Comet environment variable in the terminal, and your search will use the existing Optimizer, rather than creating a new one.

$ export COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
$ python example-2.py example-2.config

or

$ export COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
$ comet optimize -j 2 example-2.py example-2.config

You can also just pass the Optimizer ID on the command line instead of the file name if you have written your script in the style of example-2.py:

$ python example-2.py 303faefd8194400694ec9588bda8338d

or

$ comet optimize -j 2 example-2.py 303faefd8194400694ec9588bda8338d

You can also have comet optimize pass along arguments to your script. Simply add those after the config, following two dashes, so:

$ comet optimize -j 4 script.py opt.config -- --project-name "test-007"

Then you can use the argparse module, so:

# example-3.py

from comet_ml import Optimizer, Experiment

import argparse

parser = argparse.ArgumentParser()

## Add your own args here:
parser.add_argument("--project-name", default=None)

## These passed on from "comet optimize":
parser.add_argument("optimizer", default="test1_optimizer.json")
parser.add_argument("--trials", "-t", type=int, default=None)

parsed = parser.parse_args()

count = 0
for experiment in opt.get_experiments():
    loss = train(experiment.params["x"])
    msg = experiment.log_metric("loss", loss)
    count += 1
print("Optimizer job done! Completed %s experiments." % count)

The above program can then be used alone, or with the comet optimize to run scripts in parallel with custom command-line arguments.

Called normally:

$ python example-3.py opt.config --project-name "my-project-01"

Or in parallel:

$ comet optimize example-3.py opt.config -- --project-name "my-project-01"

What if an experiment doesn't finish?¶

By default, all of the algorithms will not release duplicate sets of parameters (except when the value of trials is greater than 1). But what should you do if an experiment crashes and never notifies the Optimizer?

You have two choices:

Either:

You can run the Optimizer search with the retryAssignLimit spec settings:

{
    "algorithm": "bayes",
    "spec": {
        "retryAssignLimit": 1,
        ...
    },
    "parameters": {...},
    "name": "My Bayesian Search",
    "trials": 1,
}

Using a retryAssignLimit value greater than zero will continue to assign the parameter set until an experiment marks it as "completed" or the number of retries is equal to retryAssignLimit.

OR

You can run the Optimizer search/sweep again. You can either run all of the parameter value combinations again, or a subset thereof.

Optimizer Search Space is too big¶

Try lowering the parameter search space by removing some of the parameters, if the "random" algorithm is used, try an even lower parameter search space.

Debugging¶

You can set the configuration variable COMET_LOGGING_CONSOLE to "info" to see tracebacks for any Comet-based issues.

Either set the COMET_LOGGING_CONSOLE on the command line, so:

COMET_LOGGING_CONSOLE=info python script.py

or programatically:

export COMET_LOGGING_CONSOLE=info
python script.py

This procedure often yields enough information to help track down a problem (for example, the reason why an image is not logged). However, if you need the maximum amount of debug information, create a Comet debug log file, as described here:

To create a Comet debug log file, set two configuration variables: COMET_LOGGING_FILE and COMET_LOGGING_FILE_LEVEL as described above. There are several ways you can do this:

Here is how you can set them in the bash environment:

$ export COMET_LOGGING_FILE=/tmp/comet.log
$ export COMET_LOGGING_FILE_LEVEL=debug

Here is the contents of a sample .comet.config file:

[comet_logging]
file = /tmp/comet.log
file_level = debug

You can also define them at the same time as you run your script:

$ COMET_LOGGING_FILE_LEVEL=debug \
    COMET_LOGGING_FILE=/tmp/comet.log \
    python script.py

Finally, you can also put them into the script itself, before you import comet_ml:

import os
os.environ["COMET_LOGGING_FILE"] = "/tmp/comet.log"
os.environ["COMET_LOGGING_FILE_LEVEL"] = "debug"

import comet_ml
...

In these examples, the debugging logs have been sent to /tmp/comet.log, but you can put them wherever you like, and name them as you like. This log will show details on all of the steps of your experiment, and any details about failures. If you still have problems, share this file with us using the Slack channel.

Also, make sure that your comet_ml version is up to date. You can find the latest version number on the Python Packaging comet_ml page. To upgrade, use the command:

$ pip install comet_ml --upgrade

In some cases, you might want to also update all of the packages that comet_ml depends on. You can do that using:

$ pip install comet_ml --upgrade --upgrade-strategy eager

Apr. 29, 2024