comet_ml.integration.ray ΒΆ
CometTrainLoggerCallback ΒΆ
CometTrainLoggerCallback(
ray_config: Dict[str, Any],
tags: Optional[List[str]] = None,
save_checkpoints: bool = False,
share_api_key_to_workers: bool = False,
experiment_name: Optional[str] = None,
api_key: Optional[str] = None,
workspace: Optional[str] = None,
project_name: Optional[str] = None,
experiment_key: Optional[str] = None,
mode: Optional[str] = None,
online: Optional[bool] = None,
**experiment_kwargs
)
Ray Callback for logging Train results to Comet.
This Ray Train LoggerCallback
sends metrics and parameters to Comet for tracking.
This callback is based on the Ray native Comet callback and has been modified to allow to track resource usage on all distributed workers when running a distributed training job. It cannot be used with Ray Tune.
Parameters:
ray_config
(Dict[str, Any]
) βRay configuration dictionary to share with workers. It must be the same dictionary instance, not a copy.
tags
(Optional[List[str]]
, default:None
) βTags to add to the logged Experiment.
save_checkpoints
(bool
, default:False
) βIf
True
, model checkpoints will be saved to Comet ML as artifacts.share_api_key_to_workers
(bool
, default:False
) βIf
True
, Comet API key will be shared with workers via ray_config dictionary. This is an unsafe solution and we recommend you uses a more secure way to set up your API Key in your cluster.experiment_name
(Optional[str]
, default:None
) βCustom name for the Comet experiment. If
None
, a name is generated automatically.api_key
(string
, default:None
) βComet API key.
workspace
(string
, default:None
) βComet workspace name.
project_name
(string
, default:None
) βComet project name.
experiment_key
(string
, default:None
) βExperiment key to be used for logging.
mode
(string
, default:None
) βControls how the Comet experiment is started, 3 options are possible:
- "get": Continue logging to an existing experiment identified by the
experiment_key
value. - "create": Always creates of a new experiment, useful for HPO sweeps.
- "get_or_create" (default): Starts a fresh experiment if required, or persists logging to an existing one.
- "get": Continue logging to an existing experiment identified by the
online
(bool
, default:None
) βif True, the data will be logged to Comet server, otherwise it will be stored locally in offline experiment.
experiment_kwargs
βOther keyword arguments will be passed to the constructor for comet_ml.Experiment.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
on_experiment_end ΒΆ
on_experiment_end(trials: List[Trial], **info)
comet_ray_train_logger ΒΆ
comet_ray_train_logger(
trainer: DataParallelTrainer,
tags: Optional[List[str]] = None,
save_checkpoints: bool = False,
share_api_key_to_workers: bool = False,
experiment_name: Optional[str] = None,
api_key: Optional[str] = None,
workspace: Optional[str] = None,
project_name: Optional[str] = None,
experiment_key: Optional[str] = None,
mode: Optional[str] = None,
online: Optional[bool] = None,
**experiment_kwargs
) -> None
Enables the registration of a Comet Ray callback with the specified trainer to collect and send training metrics and parameters to Comet for experiment tracking.
This callback is adapted from the native Ray Comet callback and modified to monitor resource usage across all distributed workers during distributed training jobs. Note that it is not compatible with Ray Tune.
Parameters:
trainer
(DataParallelTrainer
) βRay Trainer object.
tags
(Optional[List[str]]
, default:None
) βTags to add to the logged Experiment.
save_checkpoints
(bool
, default:False
) βIf
True
, model checkpoints will be saved to Comet ML as artifacts.share_api_key_to_workers
(bool
, default:False
) βIf
True
, Comet API key will be shared with workers via ray_config dictionary. This is an unsafe solution, and we recommend you to use a more secure way to set up your API Key in your cluster.experiment_name
(Optional[str]
, default:None
) βCustom name for the Comet experiment. If
None
, a name is generated automatically.api_key
(string
, default:None
) βComet API key.
workspace
(string
, default:None
) βComet workspace name.
project_name
(string
, default:None
) βComet project name.
experiment_key
(string
, default:None
) βExperiment key to be used for logging.
mode
(string
, default:None
) βControls how the Comet experiment is started, 3 options are possible:
- "get": Continue logging to an existing experiment identified by the
experiment_key
value. - "create": Always creates of a new experiment, useful for HPO sweeps.
- "get_or_create" (default): Starts a fresh experiment if required, or persists logging to an existing one.
- "get": Continue logging to an existing experiment identified by the
online
(bool
, default:None
) βif True, the data will be logged to Comet server, otherwise it will be stored locally in offline experiment.
experiment_kwargs
βOther keyword arguments will be passed to the constructor for comet_ml.Experiment.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
comet_worker ΒΆ
comet_worker(func)
This decorator enables you to monitor resource usage for each distributed worker during a distributed training job. By applying this decorator, you can annotate any training function to integrate Cometβs resource tracking.
Note: This should be used together with the comet_ml.integration.ray.CometTrainLoggerCallback callback, and the training function must accept a configuration dictionary as an input argument.
Parameters:
func
(Callable
) βThe training function to be wrapped which should have configuration dictionary as an input argument. The training function is a user-defined Python function that contains the end-to-end model training loop logic. When launching a distributed training job, each worker executes this training function.
Example
1 2 3 |
|
comet_worker_logger ΒΆ
comet_worker_logger(
ray_config: Dict[str, Any],
api_key: Optional[str] = None,
**experiment_kwargs
)
This context manager allows you to track resource usage from each distributed worker when running a distributed training job. It must be used in conjunction with comet_ml.integration.ray.CometTrainLoggerCallback callback.
Parameters:
ray_config
(dict
) βRay configuration dictionary from ray driver node.
api_key
(str
, default:None
) βComet API key. If not None it will be passed to ExistingExperiment. This argument has priority over api_key in ray_config dict and api key in environment.
**experiment_kwargs
βOther keyword arguments will be passed to the constructor for comet_ml.ExistingExperiment.
Example
1 2 3 |
|
If some required information is missing (like the API Key) or something wrong happens, this will return a disabled Experiment, all methods calls will succeed but no data is gonna be logged.