Python SDK for Artifacts Overview

Artifacts live in a Comet Workspace and are identified by their name. Each artifact can have multiple versions identified by their version string number.

How to add an asset to an Artifact

To log an artifact, you need to first create an Artifact() instance. When you create such an Artifact instance and don't provide an artifact version number string, a new version will be automatically created for you. If it is the first time you have logged an Artifact with this name in this particular Workspace, it will receive the version string number "1.0.0". Otherwise, it will receive the next major version number. For example, if you log a new version of an artifact that currently has a version of "2.5.14", then the new version number will be "3.0.0".

After creating an Artifact instance, you then can add asset files or a remote URL to the Artifact. When you are ready to send the Artifact to the cloud, you will log it with Experiment.log_artifact(ARTIFACT). You can also add aliases when creating a new Artifact() with the aliases=["alias1", "alias2"] argument.

Let's take a look at a specific example.

NOTE: all of these examples assume that you have set your Comet API key via one of the methods. See Python Configuration for more information.

```python from comet_ml import Artifact, Experiment

experiment = Experiment() artifact = Artifact("artifact-name", "dataset") artifact.add("./local-file")

experiment.log_artifact(artifact) experiment.end() ```

In the above example, we create an Artifact with the name "artifact-name" and type "dataset". These are completely arbitrary strings. However, it would be useful to you to name the artifacts in a way that will make sense to you. Typical artifact types could be "dataset", "image", "training-data", "validation-data", "testing-data", etc.

You can update all the Artifact attributes before logging the artifact object:

```python import datetime from comet_ml import Artifact, Experiment experiment = Experiment() artifact = Artifact("artifact-name", "dataset")

artifact.name = "my-specific-artifact-name" artifact.artifact_type = "training-dataset" artifact.metadata.update({"current_date": datetime.datetime.utcnow().isoformat()}) artifact.version = "1.4.5" artifact.aliases |= {"staging"} # Aliases are stored a set artifact.tags |= {"customer:1"} # Tags are stored a set ```

How to add a remote asset to an Artifact

Sometimes you might want to log a reference to an asset rather than the asset itself. For example, consider that you have a very large dataset (say, hundreds of gigabytes) that lives in an S3 storage bucket. In this case, it would make sense to log this as a "remote" asset. A remote asset URI can be any string; no particular format is expected.

```python from comet_ml import Artifact, Experiment

experiment = Experiment() artifact = Artifact("artifact-name", "artifact-type") artifact.add_remote( "s3://bucket/dir/train.csv", )

experiment.log_artifact(artifact) experiment.end() ```

How to get a Logged Artifact Version

You can retrieve a logged artifact from any workspace that you have permission to access, and a workspace name with the Experiment.get_artifact() method:

python logged_artifact = experiment.get_artifact(NAME, WORKSPACE, version_or_alias=VERSION_OR_ALIAS)

You can retrieve a logged artifact in three ways in the Python SDK:

  1. Get the latest artifact version by leaving out the version and alias arguments
  2. Get a specific artifact version by passing the version argument
  3. Get an aliased artifact version by passing the alias argument

The Experiment.assets attribute contains all the logged assets for a given artifact version. You can distinguish between remote and non-remote assets using the remote attribute of each asset.

```python from comet_ml import Experiment

experiment = Experiment() logged_artifact = experiment.get_artifact( "artifact-name", WORKSPACE, )

for asset in logged_artifact.assets: if asset.remote: print(asset.link) else: print(asset.logical_path) print(asset.size) print(asset.metadata) print(asset.asset_type) print(asset.id) print(asset.artifact_version_id) print(asset.artifact_id) ```

How to download a Logged Artifact

Downloading a logged artifact gives you all of the non-remote assets on your local disk. This will also record that the new experiment has accessed the artifact, for tracking the data flow in your pipeline.

```python from comet_ml import Experiment

experiment = Experiment() logged_artifact = experiment.get_artifact( "artifact-name", WORKSPACE, )

Download the artifact:

local_artifact = logged_artifact.download("/data/input") for asset in local_artifact.assets: if asset.remote: print(asset.link) else: print(asset.logical_path) print(asset.size) print(asset.metadata) print(asset.asset_type) print(asset.id) print(asset.artifact_version_id) print(asset.artifact_id) ```

This will download only non-remote assets. You can access remote assets through the assets attribute of the logged artifact object and retrieve a remote asset link through the link attribute.

Update an Artifact Version

Here is how you can retrieve an existing artifact version, add a new file, compute the new version and log it:

```python from comet_ml import Experiment

experiment = Experiment()

logged_artifact = experiment.get_artifact("artifact-name", WORKSPACE)

local_artifact = logged_artifact.download("/data/input")

local_artifact.add("./new-file") local_artifact.version = logged_artifact.version.next_minor()

experiment.log_artifact(local_artifact) ```

See Also

Some related topics: