Skip to content

Troubleshooting

This page presents possible warnings and errors that you might encounter and the steps to take to address them. There are also some tips on debugging.

For further assistance on any of these Python warnings or errors, or if you see an error message that is not noted here, ping us on our Slack channel.

General errors

ImportError: Please import comet before importing these modules: ...

This error occurs when you try to create an Experiment (or another kind of experiment, such as OfflineExperiment) but have imported comet_ml after one of the supported machine learning libraries (such as Torch, fastai, Keras, or TensorFlow). You have two choices to resolve this error: you can either move comet_ml to be imported first (this is the recommended method), or you can completely disable Comet's auto logging facility by setting COMET_DISABLE_AUTO_LOGGING=1 in the environment, or in your Comet config file.

COMET ERROR: Run will not be logged

This error is shown with a Python stack trace and indicates that the initial handshake between Comet and the server failed. This is usually a local networking issue or production downtime. Reach out on our Slack channel if you encounter this error.

COMET ERROR: Failed to set run source code

Comet failed to read the source code file for this Experiment. This could happen in rare cases where a library wraps your code or where Comet cannot read the source file.

First, check to see if you can access Comet in general. Issue this curl command from your terminal:

curl -i https://www.comet.com/clientlib/isAlive/ping

You should get back something like this:

HTTP/2 200 
date: Tue, 12 Jul 2022 07:58:41 GMT
content-type: application/json
content-length: 66
set-cookie: AWSALB=...; Expires=Tue, 19 Jul 2022 07:58:41 GMT; Path=/
set-cookie: AWSALBCORS=...; Expires=Tue, 19 Jul 2022 07:58:41 GMT; Path=/; SameSite=None; Secure
server: nginx
comet-ver: 97a6a5e8db8b665f95a543b5a0bc383531e09b69
comet-app-server: backend-python-3.production.comet-ml.internal
access-control-expose-headers: Comet-Ver, Comet-App-Server
vary: Accept-Encoding

{"msg":"Healthy Server","code":200,"data":null,"sdk_error_code":0}

If you did get something similar, then that would indicate that you have a Python-related issue, rather than an OS-related issues.

If you did not see the above output, then it is an OS-related issue. If you are on a Mac computer, these links might provide some solutions:

If it is a Python-related issue, it could be a bad Python websocket library:

One solution is to install a specific version of the websocket-client library. Run:

pip install websocket-client==0.47.0

Otherwise, you should contact your local system administrator as you are probably experiencing a problem related to OS configuration.

Issues with optimizers

The following are issues you might encounter while working with optimizers.

Continue from crashed or paused optimizer

If you pause your search, or if your optimizer script ever crashes, you can recover your search and pick up immediately from where you left off. You need only define the COMET_OPTIMIZER_ID in the environment and run your script again. The COMET_OPTIMIZER_ID is printed in the terminal at the start of each sweep. It is also logged with each experiment in the Other tab.

Here is an example of a script crashing, and continuing with the search:

$ python script.py

COMET INFO: COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60
... it crashes for some reason

$ edit script.py

$ export COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60

$ python script.py
COMET INFO: COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60

You can also supply the optimizer ID to the Optimizer class rather than the file name containing the optimizer config. For example, consider again example-2.py from above:

# file: example-2.py

from comet_ml import Optimizer
import sys

# Next, create an optimizer, passing in the config:
# (You can leave out API_KEY if you already set it.)
opt = Optimizer(sys.argv[1])

# define fit function here!

# Finally, get experiments, and train your models:
for experiment in opt.get_experiments(
        project_name="optimizer-search-03"):
    # Test the model
    loss = fit(experiment.get_parameter("x"))
    experiment.log_metric("loss", loss)

Recall that you can start that program up, so:

$ python example-2.py example-2.config

or using comet optimize:

$ comet optimize -j 2 example-2.py example-2.config

To use the same script and start up where you left off, you only need the Comet Optimizer ID. When you start up a new optimizer, you will see a line similar to this:

COMET INFO: COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d

You can set this Comet environment variable in the terminal, and your search will use the existing Optimizer, rather than creating a new one.

$ export COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
$ python example-2.py example-2.config

or

$ export COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
$ comet optimize -j 2 example-2.py example-2.config

You can also just pass the Optimizer ID on the command line instead of the file name if you have written your script in the style of example-2.py:

$ python example-2.py 303faefd8194400694ec9588bda8338d

or

$ comet optimize -j 2 example-2.py 303faefd8194400694ec9588bda8338d

You can also have comet optimize pass along arguments to your script. Simply add those after the config, following two dashes, so:

$ comet optimize -j 4 script.py opt.config -- --project-name "test-007"

Then you can use the argparse module, so:

# example-3.py

from comet_ml import Optimizer, Experiment

import argparse

parser = argparse.ArgumentParser()

## Add your own args here:
parser.add_argument("--project-name", default=None)

## These passed on from "comet optimize":
parser.add_argument("optimizer", default="test1_optimizer.json")
parser.add_argument("--trials", "-t", type=int, default=None)

parsed = parser.parse_args()

count = 0
for experiment in opt.get_experiments():
    loss = train(experiment.params["x"])
    msg = experiment.log_metric("loss", loss)
    count += 1
print("Optimizer job done! Completed %s experiments." % count)

The above program can then be used alone, or with the comet optimize to run scripts in parallel with custom command-line arguments.

Called normally:

$ python example-3.py opt.config --project-name "my-project-01"
Or in parallel:
$ comet optimize example-3.py opt.config -- --project-name "my-project-01"

What if an experiment doesn't finish?

By default, all of the algorithms will not release duplicate sets of parameters (except when the value of trials is greater than 1). But what should you do if an experiment crashes and never notifies the Optimizer?

You have two choices:

Either:

  • You can run the Optimizer search with the retryAssignLimit spec settings:
{"algorithm": "bayes",
 "spec": {
    "retryAssignLimit": 1,
    ...
 },
 "parameters": {...},
 "name": "My Bayesian Search",
 "trials": 1,
}

Using a retryAssignLimit value greater than zero will continue to assign the parameter set until an experiment marks it as "completed" or the number of retries is equal to retryAssignLimit.

OR

  • You can run the Optimizer search/sweep again. You can either run all of the parameter value combinations again, or a subset thereof.

Debugging

You can set the configuration variable COMET_LOGGING_CONSOLE to "info" to see tracebacks for any Comet-based issues.

Either set the COMET_LOGGING_CONSOLE on the command line, so:

COMET_LOGGING_CONSOLE=info python script.py

or programatically:

export COMET_LOGGING_CONSOLE=info
python script.py

This procedure often yields enough information to help track down a problem (for example, the reason why an image is not logged). However, if you need the maximum amount of debug information, create a Comet debug log file, as described here:

To create a Comet debug log file, set two configuration variables: COMET_LOGGING_FILE and COMET_LOGGING_FILE_LEVEL as described above. There are several ways you can do this:

  • Here is how you can set them in the bash environment:
$ export COMET_LOGGING_FILE=/tmp/comet.log
$ export COMET_LOGGING_FILE_LEVEL=debug
  • Here is the contents of a sample .comet.config file:
[comet_logging]
file = /tmp/comet.log
file_level = debug
  • You can also define them at the same time as you run your script:
$ COMET_LOGGING_FILE_LEVEL=debug \
    COMET_LOGGING_FILE=/tmp/comet.log \
    python script.py
  • Finally, you can also put them into the script itself, before you import comet_ml:
import os
os.environ["COMET_LOGGING_FILE"] = "/tmp/comet.log"
os.environ["COMET_LOGGING_FILE_LEVEL"] = "debug"

import comet_ml
...

In these examples, the debugging logs have been sent to /tmp/comet.log, but you can put them wherever you like, and name them as you like. This log will show details on all of the steps of your experiment, and any details about failures. If you still have problems, share this file with us using the Slack channel.

Also, make sure that your comet_ml version is up to date. You can find the latest version number on the Python Packaging comet_ml page. To upgrade, use the command:

$ pip install comet_ml --upgrade

In some cases, you might want to also update all of the packages that comet_ml depends on. You can do that using:

$ pip install comet_ml --upgrade --upgrade-strategy eager

Rate limits

The Comet API might rate limit submission of requests for your experiments. Such limits are managed as an allowed-number-of-operations-per-time window, where an operation might be read or an update.

Breaching these rate limits will cause your experiment to be throttled. In such cases:

  • A notice is displayed.
  • A warning symbol appears for the affected experiment.

Online experiments

For online experiments, the following rate limits are in effect:

  • Logging metrics: 10,000 per minute
  • Logging parameters: 8,000 per minute
  • Logging output: 10,000 per minute
  • Logging everything else: 8,000 per minute

Offline experiments

For offline experiments, the following rate limits are in effect:

  • Logging metrics: 80,000 per minute
  • Logging parameters: 80,000 per minute
  • Logging output: 80,000 per minute
  • Logging everything else: 80,000 per minute

Info

Offline experiments have a rate limit because you might be uploading multiple experiments in parallel, or attempting to upload too many, too quickly.

Size and count limits

For all experiments, you can log 15,000 total values for each metric, per experiment. If your metric count goes beyond this limit, then the values are downsampled.

REST API

You can use the REST API, either from the Python SDK, or by using the URL endpoints directly, the following limits are in effect per experiment.

Info

Each of the following REST API items typically needs to be logged only once per experiment.

  • 10 environment detail updates
  • 10 Git metadata updates
  • 10 graph (model) updates
  • 10 OS packages updates
  • 10 code updates

In addition, when using the REST API, the following limits are in effect:

  • 1,000 HTML updates per experiment.
  • Each API key is allowed to make 15,000 submissions per hour.

Solutions to rate limits

If you notice you are hitting rate limits based on normal experiments, try reporting on each epoch, rather than each step.

If you still encounter rate limits, consider using the OfflineExperiment interface. This only requires that you change:

experiment = Experiment(...)

to

experiment = OfflineExperiment(..., offline_directory="/path/to/save/experiments")

After the experiment is complete, you then run:

comet upload /path/to/save/experiments/*.zip

to send your experiment to Comet.

For more information on how best to handle rate limits, reach out to support@comet.ml or chat to us using our Slack channel.

Nov. 29, 2022