Skip to content

Artifacts

Artifacts live in a Comet Workspace and are identified by their name. Each artifact can have multiple versions, identified by their version string numbers.

Add an asset to an Artifact

To log an artifact, you must first create an Artifact instance. When you create such an Artifact instance and don't provide an artifact version number string, a new version will be automatically created for you. If it is the first time you have logged an Artifact with this name in this particular Workspace, it will receive the version string number "1.0.0". Otherwise, it will receive the next major version number. For example, if you log a new version of an artifact that currently has a version of "2.5.14", then the new version number will be "3.0.0".

After creating an Artifact instance, you can then add asset files or a remote URL to the Artifact. When you are ready to send the Artifact to the cloud, you will log it with Experiment.log_artifact(ARTIFACT). You can also add aliases when creating a new Artifact with the aliases=["alias1", "alias2"] argument.

Look at a specific example.

Note

All of these examples assume that you have set your Comet API key using one of the methods. For more information, see Python Configuration.

from comet_ml import Artifact, Experiment

experiment = Experiment()
artifact = Artifact("artifact-name", "dataset")
artifact.add("./local-file")

experiment.log_artifact(artifact)
experiment.end()

In the above example, an Artifact with the name "artifact-name" and type "dataset" is created. These are completely arbitrary strings - it would be useful for you to name the artifacts in a way that will make sense to you. Typical artifact types could be dataset, image, training-data, validation-data, testing-data, and so on.

You can update all the Artifact attributes before logging the artifact object. Here's how:

import datetime
from comet_ml import Artifact, Experiment
experiment = Experiment()
artifact = Artifact("artifact-name", "dataset")

artifact.name = "my-specific-artifact-name"
artifact.artifact_type = "training-dataset"
artifact.metadata.update({"current_date": datetime.datetime.utcnow().isoformat()})
artifact.version = "1.4.5"
artifact.aliases |= {"staging"} # Aliases are stored a set
artifact.tags |= {"customer:1"} # Tags are stored a set

Add a remote asset to an Artifact

Sometimes you might want to log a reference to an asset rather than the asset itself. For example, consider that you have a very large dataset (say, hundreds of gigabytes) that is stored in an S3 storage bucket. In this case, it would make sense to log this as a "remote" asset. A remote asset URI can be any string; no particular format is expected.

from comet_ml import Artifact, Experiment

experiment = Experiment()
artifact = Artifact("artifact-name", "artifact-type")
artifact.add_remote(
    "s3://bucket/dir/train.csv",
)

experiment.log_artifact(artifact)
experiment.end()

Get a logged Artifact version

You can retrieve a logged artifact from any workspace that you have permission to access, and a workspace name with the Experiment.get_artifact() method:

logged_artifact = experiment.get_artifact(NAME, WORKSPACE, version_or_alias=VERSION_OR_ALIAS)

Using the Python SDK, you can retrieve a logged artifact in three ways:

  • Get the latest artifact version by leaving out the version and alias arguments.
  • Get a specific artifact version by passing the version argument.
  • Get an aliased artifact version by passing the alias argument.

The LoggedArtifact.assets attribute contains all the logged assets for a given artifact version. You can distinguish between remote and non-remote assets using the remote attribute of each asset, so:

from comet_ml import Experiment

experiment = Experiment()
logged_artifact = experiment.get_artifact(
    "artifact-name",
    WORKSPACE,
)

for asset in logged_artifact.assets:
    if asset.remote:
        print(asset.link)
    else:
        print(asset.logical_path)
        print(asset.size)
    print(asset.metadata)
    print(asset.asset_type)
    print(asset.id)
    print(asset.artifact_version_id)
    print(asset.artifact_id)

Download a logged Artifact

Downloading a logged artifact gives you all of the non-remote assets on your local disk. This will also record that the new experiment has accessed the artifact, for tracking the data flow in your pipeline.

from comet_ml import Experiment

experiment = Experiment()
logged_artifact = experiment.get_artifact(
    "artifact-name",
    WORKSPACE,
)

# Download the artifact:
local_artifact = logged_artifact.download("/data/input")
for asset in local_artifact.assets:
    if asset.remote:
        print(asset.link)
    else:
        print(asset.logical_path)
        print(asset.size)
    print(asset.metadata)
    print(asset.asset_type)
    print(asset.id)
    print(asset.artifact_version_id)
    print(asset.artifact_id)

This will download only non-remote assets. You can access remote assets through the assets attribute of the logged artifact object and retrieve a remote asset link through the link attribute.

Update an Artifact version

Here is how you can retrieve an existing artifact version, add a new file, compute the new version and log it:

from comet_ml import Experiment

experiment = Experiment()

logged_artifact = experiment.get_artifact("artifact-name", WORKSPACE)

local_artifact = logged_artifact.download("/data/input")

local_artifact.add("./new-file")
local_artifact.version = logged_artifact.version.next_minor()

experiment.log_artifact(local_artifact)

See also

Jun. 27, 2022