Comet Artifacts allow you to keep track of any data associated with the ML Lifecycle. Depending on your application you might decide to either upload the dataset directly to Comet or using the Remote Artifacts feature to store a reference to it instead. No matter the option you choose, Comet will maintain the lineage between your datasets and the training runs that created or consumed them.
Artifacts live at a Comet Workspace level and are identified by their name. Each Artifact can have multiple versions allowing you to keep track of exactly which dataset was used.
Logging an Artifact¶
Logging an Artifact has three steps:
- Create an Artifact Version
- Add files and folders to this Artifact Version
- Logging this Artifact Version to an Experiment
Creating an Artifact Version¶
To log an Artifact, you must first create an
Artifact instance to which you then add some files or folders. This
Artifact can then be uploaded to Comet through the
When creating an Artifact object, you can specify the version number as well as aliases, metadata and version tags. These parameters allow you to keep all your Artifacts and their versions organized and makes them easier to query.
version parameter is optional, if you don't specify it Comet will auto-increment to the next major version number.
Adding files and folders to an Artifact Version¶
After creating an Artifact Version, you can then add files and folders to it. These files and folders are refered to as "artifact assets" which are broken down into two categories "artifact assets" and "remote artifact assets":
Artifact assets: Refers to files and folders for which the content is uploaded to Comet
Remote artifact assets: Refers to files and folders for which Comet only stores a reference to but not the content itself. Remote artifact assets can be any string allowing easy integration into your existing data versioning system.
If a remote artifact assets is a GCS or S3 bucket path, Comet has a few special tricks up it's sleave. Assuming the credentials are correctly configured, when you log an S3 bucket as a remote Artifact asset Comet will automatically keep track of all the files in that bucket and allow you to easily download them. You get all the lineage benefits of Artifacts without needing to upload your data to another location!
Learn more about S3 and GCP support for Remote Artifacts Assets.
Log an Artifact to Comet¶
When you are ready to send the Artifact to Comet, you will log it with
Let's look a full example:
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
You can find the full reference documentation for the
Artifact object here
Access a logged Artifact version¶
You can retrieve a logged Artifact from any workspace that you have permission to access, and a workspace name with the
1 2 3 4
To make it easier to access the artifact, you can retrieve a logged Artifact in three ways:
- Get the latest Artifact version by leaving out the
- Get a specific Artifact version by passing in the
- Get an aliased Artifact version by passing in the
Once you have retrieve a logged Artifact, you can then either:
- Download the artifact so you can use it within your training scripts
- Inspect the contents of an artifact without downloading it locally first
Download a logged Artifact¶
Downloading a logged Artifact brings the following assets to your disk:
- All non-remote assets
- S3 and GCP remote assets if authentication to these services in configured
This action also records that the new Experiment has accessed the Artifact, for tracking the data flow in your pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13
Inspect a logged Artifact¶
LoggedArtifact.assets attribute contains all the logged assets for a given Artifact version. You can distinguish between remote and non-remote assets using the
remote attribute of each asset, so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Update an Artifact version¶
Artifact version are immutable, once an Artifact is logged it can no longer be updated so as to maintain accurate lineage. You can however create a new Artifact version using a previous version as a starting point.
Here is how you can retrieve an existing Artifact version, add a new file, compute the new version and log it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Artifact: the class to use when assembling Artifacts to log.
ArtifactAsset: the Artifact Asset class when logging assets.
LoggedArtifact: the type of Artifact returned from
LoggedArtifactAsset: the logged Artifact Asset class when accessing logged assets.
- Use Artifacts: an overview of the Comet Artifacts feature.
- Artifacts UI: Artifacts pages in the Comet UI.