Troubleshooting
This page presents possible warnings and errors that you might encounter and the steps to take to address them. There are also some tips on debugging.
For further assistance on any of these Python warnings or errors, or if you see an error message that is not noted here, ping us on our Slack channel.
General errors¶
ImportError: Please import comet before importing these modules: ...¶
This error occurs when you try to create an Experiment
(or another kind of experiment, such as OfflineExperiment
) but have imported comet_ml
after one of the supported machine learning libraries (such as Torch, fastai, Keras, or TensorFlow). You have two choices to resolve this error: you can either move comet_ml
to be imported first (this is the recommended method), or you can completely disable Comet's auto logging facility by setting COMET_DISABLE_AUTO_LOGGING=1
in the environment, or in your Comet config file.
COMET ERROR: Run will not be logged¶
This error is shown with a Python stack trace and indicates that the initial handshake between Comet and the server failed. This is usually a local networking issue or production downtime. Reach out on our Slack channel if you encounter this error.
COMET ERROR: Failed to set run source code¶
Comet failed to read the source code file for this Experiment. This could happen in rare cases where a library wraps your code or where Comet cannot read the source file.
COMET ERROR: There's seem to be an issue with your system's SSL certificate bundle. This is likely a system wide issue that is not related to Comet.¶
First, check to see if you can access Comet in general. Issue this curl
command from your terminal:
curl -i https://www.comet.com/clientlib/isAlive/ping
You should get back something like this:
HTTP/2 200
date: Tue, 12 Jul 2022 07:58:41 GMT
content-type: application/json
content-length: 66
set-cookie: AWSALB=...; Expires=Tue, 19 Jul 2022 07:58:41 GMT; Path=/
set-cookie: AWSALBCORS=...; Expires=Tue, 19 Jul 2022 07:58:41 GMT; Path=/; SameSite=None; Secure
server: nginx
comet-ver: 97a6a5e8db8b665f95a543b5a0bc383531e09b69
comet-app-server: backend-python-3.production.comet-ml.internal
access-control-expose-headers: Comet-Ver, Comet-App-Server
vary: Accept-Encoding
{"msg":"Healthy Server","code":200,"data":null,"sdk_error_code":0}
If you did get something similar, then that would indicate that you have a Python-related issue, rather than an OS-related issues.
If you did not see the above output, then it is an OS-related issue. If you are on a Mac computer, these links might provide some solutions:
- https://timonweb.com/tutorials/fixing-certificate_verify_failed-error-when-trying-requests_html-out-on-mac/.
- https://stackoverflow.com/questions/40684543/how-to-make-python-use-ca-certificates-from-mac-os-truststore.
If it is a Python-related issue, it could be a bad Python websocket library:
One solution is to install a specific version of the websocket-client library. Run:
pip install websocket-client==0.47.0
Otherwise, you should contact your local system administrator as you are probably experiencing a problem related to OS configuration.
Issues with optimizers¶
The following are issues you might encounter while working with optimizers.
Continue from crashed or paused optimizer¶
If you pause your search, or if your optimizer script ever crashes, you can recover your search and pick up immediately from where you left off. You need only define the COMET_OPTIMIZER_ID
in the environment and run your script again. The COMET_OPTIMIZER_ID
is printed in the terminal at the start of each sweep. It is also logged with each experiment in the Other tab.
Here is an example of a script crashing, and continuing with the search:
$ python script.py
COMET INFO: COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60
... it crashes for some reason
$ edit script.py
$ export COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60
$ python script.py
COMET INFO: COMET_OPTIMIZER_ID=366dcb4f38bf42aea6d2d87cd9601a60
You can also supply the optimizer ID to the Optimizer
class rather than the file name containing the optimizer config. For example, consider again example-2.py
from above:
# file: example-2.py
from comet_ml import Optimizer
import sys
# Next, create an optimizer, passing in the config:
# (You can leave out API_KEY if you already set it.)
opt = Optimizer(sys.argv[1])
# define fit function here!
# Finally, get experiments, and train your models:
for experiment in opt.get_experiments(
project_name="optimizer-search-03"):
# Test the model
loss = fit(experiment.get_parameter("x"))
experiment.log_metric("loss", loss)
Recall that you can start that program up, so:
$ python example-2.py example-2.config
or using comet optimize
:
$ comet optimize -j 2 example-2.py example-2.config
To use the same script and start up where you left off, you only need the Comet Optimizer ID. When you start up a new optimizer, you will see a line similar to this:
COMET INFO: COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
You can set this Comet environment variable in the terminal, and your search will use the existing Optimizer, rather than creating a new one.
$ export COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
$ python example-2.py example-2.config
or
$ export COMET_OPTIMIZER_ID=303faefd8194400694ec9588bda8338d
$ comet optimize -j 2 example-2.py example-2.config
You can also just pass the Optimizer ID on the command line instead of the file name if you have written your script in the style of example-2.py
:
$ python example-2.py 303faefd8194400694ec9588bda8338d
or
$ comet optimize -j 2 example-2.py 303faefd8194400694ec9588bda8338d
You can also have comet optimize
pass along arguments to your script. Simply add those after the config, following two dashes, so:
$ comet optimize -j 4 script.py opt.config -- --project-name "test-007"
Then you can use the argparse module, so:
# example-3.py
from comet_ml import Optimizer, Experiment
import argparse
parser = argparse.ArgumentParser()
## Add your own args here:
parser.add_argument("--project-name", default=None)
## These passed on from "comet optimize":
parser.add_argument("optimizer", default="test1_optimizer.json")
parser.add_argument("--trials", "-t", type=int, default=None)
parsed = parser.parse_args()
count = 0
for experiment in opt.get_experiments():
loss = train(experiment.params["x"])
msg = experiment.log_metric("loss", loss)
count += 1
print("Optimizer job done! Completed %s experiments." % count)
The above program can then be used alone, or with the comet optimize
to run scripts in parallel with custom command-line arguments.
Called normally:
$ python example-3.py opt.config --project-name "my-project-01"
$ comet optimize example-3.py opt.config -- --project-name "my-project-01"
What if an experiment doesn't finish?¶
By default, all of the algorithms will not release duplicate sets of parameters (except when the value of trials
is greater than 1). But what should you do if an experiment crashes and never notifies the Optimizer?
You have two choices:
Either:
- You can run the Optimizer search with the
retryAssignLimit
spec settings:
{"algorithm": "bayes",
"spec": {
"retryAssignLimit": 1,
...
},
"parameters": {...},
"name": "My Bayesian Search",
"trials": 1,
}
Using a retryAssignLimit
value greater than zero will continue to assign the parameter set until an experiment marks it as "completed" or the number of retries is equal to retryAssignLimit
.
OR
- You can run the Optimizer search/sweep again. You can either run all of the parameter value combinations again, or a subset thereof.
Optimizer Search Space is too big¶
Try lowering the parameter search space by removing some of the parameters, if the "random" algorithm is used, try an even lower parameter search space.
Debugging¶
You can set the configuration variable COMET_LOGGING_CONSOLE
to "info" to see tracebacks for any Comet-based issues.
Either set the COMET_LOGGING_CONSOLE
on the command line, so:
COMET_LOGGING_CONSOLE=info python script.py
or programatically:
export COMET_LOGGING_CONSOLE=info
python script.py
This procedure often yields enough information to help track down a problem (for example, the reason why an image is not logged). However, if you need the maximum amount of debug information, create a Comet debug log file, as described here:
To create a Comet debug log file, set two configuration variables: COMET_LOGGING_FILE
and COMET_LOGGING_FILE_LEVEL
as described above. There are several ways you can do this:
- Here is how you can set them in the bash environment:
$ export COMET_LOGGING_FILE=/tmp/comet.log
$ export COMET_LOGGING_FILE_LEVEL=debug
- Here is the contents of a sample
.comet.config
file:
[comet_logging]
file = /tmp/comet.log
file_level = debug
- You can also define them at the same time as you run your script:
$ COMET_LOGGING_FILE_LEVEL=debug \
COMET_LOGGING_FILE=/tmp/comet.log \
python script.py
- Finally, you can also put them into the script itself, before you
import comet_ml
:
import os
os.environ["COMET_LOGGING_FILE"] = "/tmp/comet.log"
os.environ["COMET_LOGGING_FILE_LEVEL"] = "debug"
import comet_ml
...
In these examples, the debugging logs have been sent to /tmp/comet.log
, but you can put them wherever you like, and name them as you like. This log will show details on all of the steps of your experiment, and any details about failures. If you still have problems, share this file with us using the Slack channel.
Also, make sure that your comet_ml
version is up to date. You can find the latest version number on the Python Packaging comet_ml page. To upgrade, use the command:
$ pip install comet_ml --upgrade
In some cases, you might want to also update all of the packages that comet_ml depends on. You can do that using:
$ pip install comet_ml --upgrade --upgrade-strategy eager
Rate limits¶
The Comet API might rate limit submission of requests for your experiments. Such limits are managed as an allowed-number-of-operations-per-time window, where an operation might be read or an update.
Breaching these rate limits will cause your experiment to be throttled. In such cases:
- A notice is displayed.
- A warning symbol appears for the affected experiment.
Online experiments¶
For online experiments, the following rate limits are in effect:
- Logging metrics: 10,000 per minute
- Logging parameters: 8,000 per minute
- Logging output: 10,000 per minute
- Logging everything else: 8,000 per minute
Offline experiments¶
For offline experiments, the following rate limits are in effect:
- Logging metrics: 80,000 per minute
- Logging parameters: 80,000 per minute
- Logging output: 80,000 per minute
- Logging everything else: 80,000 per minute
Info
Offline experiments have a rate limit because you might be uploading multiple experiments in parallel, or attempting to upload too many, too quickly.
Size and count limits¶
For all experiments, you can log 15,000 total values for each metric, per experiment. If your metric count goes beyond this limit, then the values are downsampled.
REST API¶
You can use the REST API, either from the Python SDK, or by using the URL endpoints directly, the following limits are in effect per experiment.
Info
Each of the following REST API items typically needs to be logged only once per experiment.
- 10 environment detail updates
- 10 Git metadata updates
- 10 graph (model) updates
- 10 OS packages updates
- 10 code updates
In addition, when using the REST API, the following limits are in effect:
- 1,000 HTML updates per experiment.
- Each API key is allowed to make 15,000 submissions per hour.
Solutions to rate limits¶
If you notice you are hitting rate limits based on normal experiments, try reporting on each epoch, rather than each step.
If you still encounter rate limits, consider using the OfflineExperiment
interface. This only requires that you change:
experiment = Experiment(...)
to
experiment = OfflineExperiment(..., offline_directory="/path/to/save/experiments")
After the experiment is complete, you then run:
comet upload /path/to/save/experiments/*.zip
to send your experiment to Comet.
For more information on how best to handle rate limits, reach out to support@comet.com or chat to us using our Slack channel.