{"id":8086,"date":"2023-11-02T10:08:41","date_gmt":"2023-11-02T18:08:41","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8086"},"modified":"2025-04-24T17:04:47","modified_gmt":"2025-04-24T17:04:47","slug":"dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\/","title":{"rendered":"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\">\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<div class=\"mq mr ee ms bg mt\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mu mv c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"mh mi mj\"><picture><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"4d6c\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">When it comes to machine learning projects, the hard truth is that training just one model on one version of a dataset won\u2019t result in a production-ready model. The entire ML lifecycle is, by its nature, <strong class=\"be nr\">deeply iterative and interdependent<\/strong>. For a given project, dataset creation and model development will undoubtedly require numerous cycles.<\/p>\n<p id=\"dd5e\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">And what\u2019s more, <a class=\"af ns\" href=\"https:\/\/proceedings.neurips.cc\/paper\/2015\/file\/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">making changes to one part of your ML workflow changes every part of your ML workflow<\/a>.<\/p>\n<p id=\"d958\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">New training data? You will need to run a few more model training experiments to understand how this new data will affect model performance. Your model isn\u2019t performing well? You might have to return and collect ground truth data samples, adjust your labels, or make other dataset changes. This <strong class=\"be nr\">feedback loop<\/strong> between dataset collection\/management and model training\/experimentation is one of the most important \u2014 and potentially costly \u2014 parts of the ML lifecycle.<\/p>\n<p id=\"87f7\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">To help you build better models faster, <strong class=\"be nr\">you and your team will need tools and capabilities that allow you to make intelligent adjustments each step of the way<\/strong> \u2014 all while having high visibility into your workflows, as well as the ability to collaborate and reproduce your work at every step.<\/p>\n<p id=\"b816\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">This is the power that tools like <a class=\"af ns\" href=\"https:\/\/www.superb-ai.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superb AI<\/a> and <a class=\"af ns\" href=\"https:\/\/bit.ly\/32HTlVK\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a> offer when put to work in unison. In this article, we\u2019re going to take a look at how these two tools can work together to help you speed up and improve two different but deeply connected stages of the ML lifecycle: (1) dataset collection and preparation + (2) model training and experimentation.<\/p>\n<p id=\"6622\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">Additionally, we\u2019ll show how you and your team can use Superb AI and Comet to create a feedback loop between model predictions, dataset iterations, and model retraining processes.<\/p>\n<h1 id=\"5b37\" class=\"nt nu fr be nv nw nx gr ny nz oa gu ob oc od oe of og oh oi oj ok ol om on oo bj\" data-selectable-paragraph=\"\">Superb AI Dataset Preparation Platform<\/h1>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<div class=\"mq mr ee ms bg mt\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mu mv c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*T8szrWmSfv8CZnHECMvGqg.png\" alt=\"\" width=\"700\" height=\"326\"><\/figure><div class=\"mh mi op\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*T8szrWmSfv8CZnHECMvGqg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*T8szrWmSfv8CZnHECMvGqg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*T8szrWmSfv8CZnHECMvGqg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*T8szrWmSfv8CZnHECMvGqg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*T8szrWmSfv8CZnHECMvGqg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*T8szrWmSfv8CZnHECMvGqg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*T8szrWmSfv8CZnHECMvGqg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*T8szrWmSfv8CZnHECMvGqg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div><figcaption class=\"oq or os mh mi ot ou be b bf z dw\" data-selectable-paragraph=\"\"><a class=\"af ns\" href=\"https:\/\/www.superb-ai.com\/product\/automate\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"ov\">Source<\/em><\/a><\/figcaption><\/figure>\n<p id=\"3f8c\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">Superb AI has introduced a revolutionary way for ML teams to drastically decrease the time it takes to deliver high-quality training datasets for computer vision use csases. Instead of relying on human labelers for a majority of the data preparation workflow, teams can now implement a much more time- and cost-efficient pipeline with the <a class=\"af ns\" href=\"https:\/\/www.youtube.com\/watch?v=JBEY9hDkjRw\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superb AI Suite<\/a>.<\/p>\n<p id=\"bb63\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">A typical data preparation pipeline might contain the following steps:<\/p>\n<ol class=\"\">\n<li id=\"dc15\" class=\"mw mx fr be b gp my mz na gs nb nc nd ne ow ng nh ni ox nk nl nm oy no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Data ingestion: <\/strong>Input data (images and videos) are often extracted from various sources. Users can upload input data into Superb Suite as raw files, via the cloud (AWS S3 or GCP), or via Superb Suite\u2019s SDK\/API.<\/li>\n<li id=\"97d6\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Ground-truth data creation: <\/strong>Having a small amount of initial ground-truth data (data with correct labels) is crucial to kickstart the labeling process. Users can create these ground-truth samples using Superb\u2019s built-in simple annotation tool with filtering capability. Superb Suite supports classification, detection, and segmentation tasks for bounding boxes, polylines, polygons, and key points for both images and videos.<\/li>\n<li id=\"fd72\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Automatic labeling: <\/strong><a class=\"af ns\" href=\"https:\/\/www.superb-ai.com\/blog\/introducing-custom-auto-label-for-long-tail-computer-vision\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superb AI\u2019s customizable auto-label technology<\/a> uses a unique mixture of transfer learning, few-shot learning, and self-supervised learning \u2014 allowing the model to quickly achieve high levels of efficiency with small customer-proprietary datasets. And because the custom auto-label has broad applications, it can be used to swiftly jump-start any project, whether that be labeling your initial dataset for training or labeling your edge cases for retraining. This will drastically reduce the time it takes to prepare and deliver datasets.<\/li>\n<li id=\"a794\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Labeled data delivery: <\/strong>The review and audit process of data labels is vital for the overall quality of the dataset. In reality, it is almost impossible to review every label manually. Superb AI Suite streamlines the review process by taking advantage of the label accuracy measures estimated by multiple machine learning models. After passing through this rigorous quality control process, the final labeled data is delivered to the MLOps pipeline.<\/li>\n<\/ol>\n<h1 id=\"33a0\" class=\"nt nu fr be nv nw nx gr ny nz oa gu ob oc od oe of og oh oi oj ok ol om on oo bj\" data-selectable-paragraph=\"\">Comet\u2019s Experiment Management Platform<\/h1>\n<p id=\"e9a3\" class=\"pw-post-body-paragraph mw mx fr be b gp ph mz na gs pi nc nd ne pj ng nh ni pk nk nl nm pl no np nq fk bj\" data-selectable-paragraph=\"\">Machine learning addresses problems that cannot be well specified programmatically. Traditional software engineering allows strong abstraction boundaries between different components of a system in order to isolate the effects of changes.<\/p>\n<p id=\"b820\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">Machine learning systems, on the other hand, are entangled with a host of upstream dependencies, such as the size of the dataset, the distribution of features within the dataset, data scaling and splitting techniques, the type of optimizer being used, etc.<\/p>\n<p id=\"5a39\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">Because ML systems lack a clear specification, data collection is an imperfect science, and effective machine learning models can be incredibly complex, <strong class=\"be nr\">experimentation is necessary<\/strong>.<\/p>\n<p id=\"f6ba\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">The goal of the experimentation process is to understand how incremental changes affect the system. Rapid experimentation over different model types, data transformations, feature engineering choices, and optimization methods allows us to discern what is and isn\u2019t working.<\/p>\n<p id=\"fa46\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">Because Machine learning is an <a class=\"af ns\" href=\"https:\/\/bit.ly\/3Hc95za\" target=\"_blank\" rel=\"noopener ugc nofollow\">experimental and iterative science<\/a>, diligent tracking of these <a class=\"af ns\" href=\"https:\/\/towardsdatascience.com\/reproducible-machine-learning-cf1841606805\" target=\"_blank\" rel=\"noopener\">multiple sources of variability<\/a> is necessary. Manually tracking these processes can be quite tedious and is further exacerbated when the size of an ML team grows and collaboration between members becomes a factor. It is well known that <a class=\"af ns\" href=\"https:\/\/thegradient.pub\/independently-reproducible-machine-learning\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">reproducibility<\/a> is an issue in many machine learning papers, and while steps are being taken to <a class=\"af ns\" href=\"https:\/\/www.cs.mcgill.ca\/~jpineau\/ReproducibilityChecklist.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">address these issues<\/a>, as humans, we are often prone to oversight.<\/p>\n<p id=\"6381\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">This is where a platform like Comet comes into the picture. Comet is <a class=\"af ns\" href=\"https:\/\/bit.ly\/32HTlVK\" target=\"_blank\" rel=\"noopener ugc nofollow\">an Experiment Management Platform<\/a> that helps practitioners automatically track, compare, visualize and share their experiments, source code, datasets, and models.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mu mv c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*69N6FQbXpIWBEld1prBRag.gif\" alt=\"\" width=\"600\" height=\"338\"><\/figure><div class=\"mh mi pm\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*69N6FQbXpIWBEld1prBRag.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*69N6FQbXpIWBEld1prBRag.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*69N6FQbXpIWBEld1prBRag.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*69N6FQbXpIWBEld1prBRag.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*69N6FQbXpIWBEld1prBRag.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*69N6FQbXpIWBEld1prBRag.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*69N6FQbXpIWBEld1prBRag.gif 1200w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*69N6FQbXpIWBEld1prBRag.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*69N6FQbXpIWBEld1prBRag.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*69N6FQbXpIWBEld1prBRag.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*69N6FQbXpIWBEld1prBRag.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*69N6FQbXpIWBEld1prBRag.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*69N6FQbXpIWBEld1prBRag.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*69N6FQbXpIWBEld1prBRag.gif 1200w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"dd52\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">A typical workflow for experimentation with Comet contains the following steps:<\/p>\n<ol class=\"\">\n<li id=\"07ae\" class=\"mw mx fr be b gp my mz na gs nb nc nd ne ow ng nh ni ox nk nl nm oy no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Define the project scope and relevant metrics<\/strong>: This is usually the most challenging part of the model development process. It involves talking to various stakeholders and clearly establishing the outcomes expected from the model and how these will be measured.<\/li>\n<li id=\"bef8\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Log the relevant dataset to Comet as an Artifact<\/strong>: <a class=\"af ns\" href=\"https:\/\/bit.ly\/3ILJygO\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet Artifacts<\/a> is a tool used to track the lineage of datasets, as well as other data assets produced during the course of experimentation (model checkpoints, intermediate datasets, etc.).<\/li>\n<li id=\"830d\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Experiment and iterate over different model types, data transformations, feature engineering choices, and optimization methods<\/strong>: We\u2019ve already mentioned that in order to discern the type of model that would work best for your dataset, experimentation is necessary. Automated tracking of these variables allows teams to iterate on their models rapidly.<\/li>\n<li id=\"f8b0\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Evaluate the model in production<\/strong>: Once a candidate model is chosen for deployment, it should be deployed to a production environment and monitored in order to assess its performance on real-world data and identify gaps in the model\u2019s performance.<\/li>\n<li id=\"ab09\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Update the data<\/strong>: The performance of a model will inevitably degrade over time as the input data changes. In order to address these performance gaps, it\u2019s necessary to update training datasets for this model with new data.<\/li>\n<li id=\"1bc7\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nr\">Retrain the model<\/strong>: Once the training datasets are updated, repeat steps 1\u20134 to update the model.<\/li>\n<\/ol>\n<h1 id=\"12cd\" class=\"nt nu fr be nv nw nx gr ny nz oa gu ob oc od oe of og oh oi oj ok ol om on oo bj\" data-selectable-paragraph=\"\">Building a Data-to-Model Pipeline and Feedback Loop with Superb AI and Comet ML<\/h1>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<div class=\"mq mr ee ms bg mt\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mu mv c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*JcBT7Txp1899nimIf9T9fQ.jpeg\" alt=\"\" width=\"700\" height=\"749\"><\/figure><div class=\"mh mi pn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*JcBT7Txp1899nimIf9T9fQ.jpeg 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"6561\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">Coupled together, Superb AI and Comet cover data preparation and model experimentation workflows, respectively. As observed in the workflow diagram above:<\/p>\n<ol class=\"\">\n<li id=\"6844\" class=\"mw mx fr be b gp my mz na gs nb nc nd ne ow ng nh ni ox nk nl nm oy no np nq oz pa pb bj\" data-selectable-paragraph=\"\">Given the raw data from your data sources, you can use the Superb AI platform to ingest the data, create a small ground-truth dataset, label that dataset with the auto-label technology, manually audit the hard labels, and generate a labeled training dataset.<\/li>\n<li id=\"7c08\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\">Then, the labeled dataset flows from the Superb AI platform to the Comet platform.<\/li>\n<li id=\"9486\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\">You now can use the Comet platform to create Artifacts of the labeled dataset, run model experiments and evaluate model performance with the Artifacts, visualize model predictions, and surface failure prediction cases.<\/li>\n<li id=\"6e09\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq oz pa pb bj\" data-selectable-paragraph=\"\">Next, you can feed the failure prediction cases from Comet to Superb AI for ground-truth labeling and kickstart the model retraining procedure.<\/li>\n<\/ol>\n<p id=\"cb69\" class=\"pw-post-body-paragraph mw mx fr be b gp my mz na gs nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fk bj\" data-selectable-paragraph=\"\">You can keep workflows between DataOps and MLOps teams separate \u2014 while enabling cross-team collaboration by preserving the visibility and auditability of the entire data-to-model pipeline across teams. With this pipeline in place, teams can increase the velocity and opportunity for seamless collaboration between scientists and engineers for machine learning workflows.<\/p>\n<h1 id=\"c2d0\" class=\"nt nu fr be nv nw nx gr ny nz oa gu ob oc od oe of og oh oi oj ok ol om on oo bj\" data-selectable-paragraph=\"\">Conclusion<\/h1>\n<p id=\"ea85\" class=\"pw-post-body-paragraph mw mx fr be b gp ph mz na gs pi nc nd ne pj ng nh ni pk nk nl nm pl no np nq fk bj\" data-selectable-paragraph=\"\">Intelligent platform choices make machine learning development much more feasible \u2014 especially as you\u2019re scaling your ML strategy. With Superb AI\u2019s data preparation capabilities and Comet\u2019s model development capabilities, your ML teams can:<\/p>\n<ul class=\"\">\n<li id=\"5c40\" class=\"mw mx fr be b gp my mz na gs nb nc nd ne ow ng nh ni ox nk nl nm oy no np nq po pa pb bj\" data-selectable-paragraph=\"\">Build, label, and audit training datasets as quickly as possible.<\/li>\n<li id=\"0bc2\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq po pa pb bj\" data-selectable-paragraph=\"\">Learn how changes to those datasets affect the performance of a trained model.<\/li>\n<li id=\"27ca\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq po pa pb bj\" data-selectable-paragraph=\"\">Have visibility into your model training experiments in order to better understand how specific dataset choices affect model performance.<\/li>\n<li id=\"bc1d\" class=\"mw mx fr be b gp pc mz na gs pd nc nd ne pe ng nh ni pf nk nl nm pg no np nq po pa pb bj\" data-selectable-paragraph=\"\">Learn where gaps exist in your training datasets, iterate on them, and compare performance across model training runs.<\/li>\n<\/ul>\n<blockquote class=\"pp\"><p id=\"a014\" class=\"pq pr fr be ps pt pu pv pw px py nq dw\" data-selectable-paragraph=\"\">Stay tuned! Our teams are at working on a technical walkthrough, and a few more fun things.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<blockquote class=\"qh qi qj\"><p id=\"4a8b\" class=\"mw mx qk be b gp my mz na gs nb nc nd ql nf ng nh qm nj nk nl qn nn no np nq fk bj\" data-selectable-paragraph=\"\">If you\u2019re interested in learning more about the Comet platform, you can <a class=\"af ns\" href=\"https:\/\/youtu.be\/cX5tx202PXM\" target=\"_blank\" rel=\"noopener ugc nofollow\">check out a demo<\/a>, or try out <a class=\"af ns\" href=\"https:\/\/bit.ly\/3u2KV6w\" target=\"_blank\" rel=\"noopener ugc nofollow\">the platform<\/a> for free<\/p><p id=\"7bc4\" class=\"mw mx qk be b gp my mz na gs nb nc nd ql nf ng nh qm nj nk nl qn nn no np nq fk bj\" data-selectable-paragraph=\"\">If you\u2019re interested in learning more about the Superb AI platform, sign up for <a class=\"af ns\" href=\"https:\/\/suite.superb-ai.com\/auth\/create?from=homepage\" target=\"_blank\" rel=\"noopener ugc nofollow\">the product<\/a> for free and read <a class=\"af ns\" href=\"https:\/\/www.superb-ai.com\/blog\" target=\"_blank\" rel=\"noopener ugc nofollow\">the blog<\/a>.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to machine learning projects, the hard truth is that training just one model on one version of a dataset won\u2019t result in a production-ready model. The entire ML lifecycle is, by its nature, deeply iterative and interdependent. For a given project, dataset creation and model development will undoubtedly require numerous cycles. And [&hellip;]<\/p>\n","protected":false},"author":39,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[23,9],"tags":[],"coauthors":[128,150],"class_list":["post-8086","post","type-post","status-publish","format-standard","hentry","category-integrations","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet\" \/>\n<meta property=\"og:description\" content=\"When it comes to machine learning projects, the hard truth is that training just one model on one version of a dataset won\u2019t result in a production-ready model. The entire ML lifecycle is, by its nature, deeply iterative and interdependent. For a given project, dataset creation and model development will undoubtedly require numerous cycles. And [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-02T18:08:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:04:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg\" \/>\n<meta name=\"author\" content=\"Dhruv Nair, James Le\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dhruv Nair, James Le\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet","og_locale":"en_US","og_type":"article","og_title":"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet","og_description":"When it comes to machine learning projects, the hard truth is that training just one model on one version of a dataset won\u2019t result in a production-ready model. The entire ML lifecycle is, by its nature, deeply iterative and interdependent. For a given project, dataset creation and model development will undoubtedly require numerous cycles. And [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-11-02T18:08:41+00:00","article_modified_time":"2025-04-24T17:04:47+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg","type":"","width":"","height":""}],"author":"Dhruv Nair, James Le","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Dhruv Nair, James Le","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\/"},"author":{"name":"James Le","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9ea207111d311668f59477646ffd469a"},"headline":"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet","datePublished":"2023-11-02T18:08:41+00:00","dateModified":"2025-04-24T17:04:47+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\/"},"wordCount":1460,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg","articleSection":["Integrations","Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet\/","url":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet","name":"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg","datePublished":"2023-11-02T18:08:41+00:00","dateModified":"2025-04-24T17:04:47+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*pkM6O6f42hd43tohqAFy8A.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/dataset-preparation-meets-experiment-and-model-management-with-superb-ai-and-comet#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Dataset Preparation Meets Experiment and Model Management with Superb AI and Comet"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9ea207111d311668f59477646ffd469a","name":"James Le","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/e9faebcdd7afdaff187857dc289b23ba","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1678305362870-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1678305362870-96x96.jpg","caption":"James Le"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/khanhle-1013gmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8086","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8086"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8086\/revisions"}],"predecessor-version":[{"id":15466,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8086\/revisions\/15466"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8086"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8086"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8086"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8086"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}