{"id":3540,"date":"2019-08-06T08:58:38","date_gmt":"2019-08-06T16:58:38","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=3540"},"modified":"2025-04-24T17:31:00","modified_gmt":"2025-04-24T17:31:00","slug":"building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/","title":{"rendered":"Building reliable machine learning pipelines with AWS Sagemaker and Comet"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>This tutorial is Part II of a series. See Part I&nbsp;<\/em><a href=\"https:\/\/live-cometml.pantheonsite.io\/blog\/building-a-devops-pipeline-for-machine-learning-and-ai-evaluating-sagemaker\/\">here<\/a><em>.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>Successfully executing machine learning at scale involves building reliable feedback loops around your models. As your pipeline grows, you will reach a point where your data can no longer fit in memory on a single machine, and your training processes will have to run in a distributed way. Regular retraining, and hyperparameter optimization of models, will become necessary as new data becomes available and your underlying feature distributions change.<\/p>\n\n\n\n<p>There is also the added complexity of trying new modeling approaches on your data and communicating the results across teams and other stakeholders.<\/p>\n\n\n\n<p><strong>In this blog post, we will illustrate how to use AWS Sagemaker and Comet.ml to simplify this process of monitoring and improving your training pipeline.<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Data complexity + model needs are growing<\/h4>\n\n\n\n<p>The challenges involved in creating functional feedback loops extend beyond data concerns like tracking changing data distributions to issues at the model level. Robust models require frequent retraining so that hyperparameters are optimal in the face of new, incoming data.<\/p>\n\n\n\n<p>With each iteration, it becomes harder to manage subsets and variations of your data and models. Keeping track of which model iteration ran on which dataset is key to reproducibility.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Take that complexity + multiply it by team complexity<\/h4>\n\n\n\n<p>Managing these complex pipelines can become even more difficult in team settings. Data scientists will often store results from several models in log files, making it impossible to reproduce models or communicate results effectively.<\/p>\n\n\n\n<p>Traditional software development tools are not optimized for the iterative nature of machine learning or the scale of data machine learning requires. The lack of tools and processes to make collaboration easy across members of the same data science teams (and across functions like engineering) has led to dramatically slower iteration cycles. Organizations also constantly suffer the pain of slow on-boarding times for new employees and bear the risk of employees churning along with their work and proprietary knowledge.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Tutorial<\/h4>\n\n\n\n<p>This tutorial covers how to integrate&nbsp;<a href=\"https:\/\/live-cometml.pantheonsite.io\/\">Comet.ml<\/a>&nbsp;with AWS Sagemaker\u2019s TensorFlow&nbsp;<a href=\"https:\/\/sagemaker.readthedocs.io\/en\/latest\/estimators.html\" target=\"_blank\" rel=\"noreferrer noopener\">Estimator API<\/a>. We will be adapting&nbsp;<a href=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/tree\/master\/sagemaker-python-sdk\/tensorflow_resnet_cifar10_with_tensorboard\" target=\"_blank\" rel=\"noreferrer noopener\">running the Resnet model on the CIFAR10 dataset with TensorFlow<\/a>.<\/p>\n\n\n\n<p>Instead of logging experiment metrics to Tensorboard, we\u2019re going to log them to Comet.ml.<\/p>\n\n\n\n<p>This allows us to keep track of various hyperparameter configurations, metrics, and code across different training runs.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AWS SageMaker provides convenient and reliable infrastructure to train and deploy machine learning models.<\/li>\n\n\n\n<li>Comet.ml automatically tracks and monitors Machine Learning experiments and models.<\/li>\n<\/ul>\n\n\n\n<p>You can check out Comet.ml&nbsp;<a href=\"https:\/\/www.comet.com\/docs\/python-sdk\/getting-started\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>&nbsp;and learn more about AWS Sagemaker&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/gs.html\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Environment Setup<\/h4>\n\n\n\n<p>When using AWS Sagemaker, your account comes with multiple pre-installed virtual environments that contain Jupyter kernels and popular python packages such as scikit, Pandas, NumPy, TensorFlow, and&nbsp;<a href=\"http:\/\/bit.ly\/2QG8ijM\" target=\"_blank\" rel=\"noreferrer noopener\">MXNet<\/a>.<\/p>\n\n\n\n<p><strong>1.<\/strong> Create a Sagemaker account. You\u2019ll be guided through all the steps for getting your credentials to submit a job to the training cluster&nbsp;<a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/setup-1-1-1024x425-b.jpeg\" alt=\"\" class=\"wp-image-1034\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/setup-2-1024x545-c.jpeg\" alt=\"\" class=\"wp-image-1035\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p><strong>2.<\/strong> Set up your Comet.ml account&nbsp;<a href=\"https:\/\/live-cometml.pantheonsite.io\/pricing\/\">here<\/a>. Once you login, we\u2019ll take you to the default project where you\u2019ll see the Quickstart Guide that provides your Project API Key.<\/p>\n\n\n\n<p><strong>3.<\/strong> Create a Sagemaker notebook instance, and start a new terminal from this instance.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/setup-3-1024x568-d.jpeg\" alt=\"\" class=\"wp-image-1036\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/setup-4-1024x278-e.jpeg\" alt=\"\" class=\"wp-image-1037\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p><strong>4.<\/strong> Open up a new Terminal instance (using the&nbsp;<strong>New<\/strong>&nbsp;dropdown). Using the command line, activate the tensorflow_p36 virtual environment<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ source activate tensorflow_p36<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/setup-5-1024x328-e.jpeg\" alt=\"\" class=\"wp-image-1038\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p>Using the terminal, clone the Sagemaker example from&nbsp;<a href=\"https:\/\/github.com\/comet-ml\/comet-sagemaker\" target=\"_blank\" rel=\"noreferrer noopener\"> <strong>https:\/\/github.com\/comet-ml\/comet-sagemaker<\/strong><\/a> into your Sagemaker instance.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ git clone https:\/\/github.com\/comet-ml\/comet-sagemaker.git &amp;&amp; cd comet-sagemaker<\/code><\/pre>\n\n\n\n<p>The repository has the following structure<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.\n\u251c\u2500\u2500 Dockerfile\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 build_and_push.sh\n\u251c\u2500\u2500 cifar10\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 cifar10.py\n\u2502   \u251c\u2500\u2500 nginx.conf\n\u2502   \u251c\u2500\u2500 requirements.txt\n\u2502   \u251c\u2500\u2500 resnet_model.py\n\u2502   \u251c\u2500\u2500 serve\n\u2502   \u2514\u2500\u2500 train\n\u251c\u2500\u2500 generate_cifar10_tfrecords.py\n\u2514\u2500\u2500 main.py<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Using Sagemaker\u2019s Bring Your Own Model<\/h4>\n\n\n\n<p>Sagemaker allows users to run custom containers on their platform. We are going to extend the Sagemaker TensorFlow docker image by installing Comet. This will allow us to automatically track our training from inside the container.<\/p>\n\n\n\n<p>We will first download the CIFAR10 dataset in the TFRecords format.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python generate_cifar10_tfrecords.py --data-dir \/tmp\/cifar10-data<\/code><\/pre>\n\n\n\n<p>Next, we\u2019re going to add our Comet API key to the Dockerfile as an environment variable<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"). You\n# may not use this file except in compliance with the License. A copy of\n# the License is located at\n#\n#     http:\/\/aws.amazon.com\/apache2.0\/\n#\n# or in the \"license\" file accompanying this file. This file is\n# distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF\n# ANY KIND, either express or implied. See the License for the specific\n# language governing permissions and limitations under the License.\n\n# For more information on creating a Dockerfile\n# https:\/\/docs.docker.com\/compose\/gettingstarted\/#step-2-create-a-dockerfile\nFROM tensorflow\/tensorflow:1.8.0-py3\n\nRUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends nginx curl\n\n# Download TensorFlow Serving\n# https:\/\/www.tensorflow.org\/serving\/setup#installing_the_modelserver\nRUN echo \"deb &#91;arch=amd64] http:\/\/storage.googleapis.com\/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal\" | tee \/etc\/apt\/sources.list.d\/tensorflow-serving.list\nRUN curl https:\/\/storage.googleapis.com\/tensorflow-serving-apt\/tensorflow-serving.release.pub.gpg | apt-key add -\nRUN apt-get update &amp;&amp; apt-get install tensorflow-model-server\n\nENV PATH=\"\/opt\/ml\/code:${PATH}\"\nENV COMET_API_KEY YOUR_API_KEY\n# \/opt\/ml and all subdirectories are utilized by SageMaker, we use the \/code subdirectory to store our user code.\nCOPY \/cifar10 \/opt\/ml\/code\nRUN pip install -r \/opt\/ml\/code\/requirements.txt\nWORKDIR \/opt\/ml\/code<\/code><\/pre>\n\n\n\n<p>After adding our API key to the Dockerfile, we\u2019re going to build our container image and push it to AWS Elastic Container Respository using the build_and_push script.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>chmod +x build_and_push.sh\n.\/build_and_push.sh &lt;name of your image&gt;<\/code><\/pre>\n\n\n\n<p>Finally, we can run the training job<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python main.py --data \/tmp\/cifar10-data --container_name &lt;name of your image&gt;<\/code><\/pre>\n\n\n\n<p>If the training is successful you should see the following message<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>2019-08-04 22:55:34 Starting - Starting the training job...\n2019-08-04 22:55:36 Starting - Launching requested ML instances......\n2019-08-04 22:56:37 Starting - Preparing the instances for training...\n2019-08-04 22:57:35 Downloading - Downloading input data\n2019-08-04 22:57:35 Training - Downloading the training image......\n2019-08-04 22:58:31 Training - Training image download completed. Training in progress.........\nTraining complete.\n2019-08-04 22:59:53 Uploading - Uploading generated training model\n2019-08-04 22:59:53 Completed - Training job completed<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Setting up a Comet Project and Hyperparameters<\/h4>\n\n\n\n<p>We can set a specific workspace and project name in the Comet Experiment object in the cifar10.py file.<\/p>\n\n\n\n<p>The hyperparameters for this experiment are in this same file. Feel free to adjust them. However, you will have to build and push the container again after changing these parameters.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Monitoring experiments in Comet.ml<\/h4>\n\n\n\n<p>Once you run the script, you\u2019ll be able to see your different model runs in Comet.ml through the direct url. As an example for this tutorial, we have created a Comet project that you can view&nbsp;<a href=\"https:\/\/www.comet.com\/cometpublic\/comet-sagemaker.\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a><\/p>\n\n\n\n<p>Let\u2019s see how we can use Comet to get a better understanding of our model. We\u2019ll start by sorting our models based on the best evaluation loss seen after 1000 steps of training.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/screenshot-1-1024x316-f.jpeg\" alt=\"\" class=\"wp-image-1039\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p>Now let\u2019s add a few visualizations to our project so that we can see how our hyperparameters are impacting our experiment. We can setup a project level&nbsp;<strong>line chart<\/strong>&nbsp;to compare our experiments across multiple runs.<\/p>\n\n\n\n<p>Then let\u2019s use a&nbsp;<strong>parallel coordinates&nbsp;<\/strong>chart to visualize which parts of our hyperparameter space are producing the best results.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/charts-1024x348-g.jpeg\" alt=\"\" class=\"wp-image-1040\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p>We can also filter out experiments using Comet\u2019s query builder. For example we can compare results from experiment where the RESNET size is 32<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/charts-2-1024x417-h.jpeg\" alt=\"\" class=\"wp-image-1041\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p>We can see that our charts, and experiment table all change based on the filters applied to the experiments.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/07\/screenshot-4-1024x584-i.jpeg\" alt=\"\" class=\"wp-image-1042\"\/><\/figure>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><\/figure>\n<\/div>\n\n\n\n<p>Our parallel coordinates chart is able to show us parts of the parameter space that we have explored, as well as where we have gaps. In this example, we see that a larger Resnet Size, and Batch Size produces a more accurate model.<\/p>\n\n\n\n<p>We can use these insights to continue iterating our model design, until we have satisfied our requirements.<\/p>\n\n\n\n<p>Comet.ml allows you to create visualizations like bar charts and line plots to track your experiments along with parallel coordinate charts. These experiment-level and project-level visualizations help you quickly identify your best-performing models and understand your parameter space.<\/p>\n\n\n\n<p>If you\u2019d like to share your results publicly, you can generate a link through the Project\u2019s Share button. Alternatively, you can also directly share experiments with collaborators by adding people as collaborators in your Workspace Settings.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Tutorial Summary<\/h4>\n\n\n\n<p>You have learned how to prepare and train a Resnet model using&nbsp;Comet.ml&nbsp;and AWS Sagemaker\u2019s TensorFlow&nbsp;<a href=\"https:\/\/sagemaker.readthedocs.io\/en\/latest\/estimators.html\" target=\"_blank\" rel=\"noreferrer noopener\">Estimator API<\/a>. To summarize the tutorial highlights, we:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Trained a Resnet model on the CIFAR-10 dataset in an Amazon Sagemaker notebook instance using one of Sagemaker\u2019s\u00a0<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/howitworks-set-kernel.html\" target=\"_blank\" rel=\"noreferrer noopener\">pre-installed virtual environments<\/a><\/li>\n\n\n\n<li>Explored a different iteration of the training experiment where we varied the hyperparameters<\/li>\n\n\n\n<li>Used Comet.ml to automatically capture our model\u2019s various hyperparameter configurations, metrics, and code across different training runs.<\/li>\n<\/ul>\n\n\n\n<p><strong>Sign-up for a free trial of Comet.ml&nbsp;<\/strong><a href=\"https:\/\/live-cometml.pantheonsite.io\/pricing\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>here<\/strong><\/a><strong>.<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial is Part II of a series. See Part I&nbsp;here. Successfully executing machine learning at scale involves building reliable feedback loops around your models. As your pipeline grows, you will reach a point where your data can no longer fit in memory on a single machine, and your training processes will have to run [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":3545,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[8,5,9],"tags":[],"coauthors":[107],"class_list":["post-3540","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-partners-integrations","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building reliable machine learning pipelines with AWS Sagemaker and Comet - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building reliable machine learning pipelines with AWS Sagemaker and Comet\" \/>\n<meta property=\"og:description\" content=\"This tutorial is Part II of a series. See Part I&nbsp;here. Successfully executing machine learning at scale involves building reliable feedback loops around your models. As your pipeline grows, you will reach a point where your data can no longer fit in memory on a single machine, and your training processes will have to run [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2019-08-06T16:58:38+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:31:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2019\/08\/setup-1-a.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1600\" \/>\n\t<meta property=\"og:image:height\" content=\"664\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Gideon Mendels\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gideon Mendels\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Building reliable machine learning pipelines with AWS Sagemaker and Comet - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/","og_locale":"en_US","og_type":"article","og_title":"Building reliable machine learning pipelines with AWS Sagemaker and Comet","og_description":"This tutorial is Part II of a series. See Part I&nbsp;here. Successfully executing machine learning at scale involves building reliable feedback loops around your models. As your pipeline grows, you will reach a point where your data can no longer fit in memory on a single machine, and your training processes will have to run [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2019-08-06T16:58:38+00:00","article_modified_time":"2025-04-24T17:31:00+00:00","og_image":[{"width":1600,"height":664,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2019\/08\/setup-1-a.jpeg","type":"image\/jpeg"}],"author":"Gideon Mendels","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Gideon Mendels","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/"},"author":{"name":"Matt Peternell","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/85aa446f8be987e848ea929ef524b67b"},"headline":"Building reliable machine learning pipelines with AWS Sagemaker and Comet","datePublished":"2019-08-06T16:58:38+00:00","dateModified":"2025-04-24T17:31:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/"},"wordCount":1187,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2019\/08\/setup-1-a.jpeg","articleSection":["Comet Community Hub","Partners &amp; Integrations","Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/","url":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/","name":"Building reliable machine learning pipelines with AWS Sagemaker and Comet - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2019\/08\/setup-1-a.jpeg","datePublished":"2019-08-06T16:58:38+00:00","dateModified":"2025-04-24T17:31:00+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2019\/08\/setup-1-a.jpeg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2019\/08\/setup-1-a.jpeg","width":1600,"height":664,"caption":"notebook to training to inference flow with Amazon Sagemaker"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/building-reliable-machine-learning-pipelines-with-aws-sagemaker-and-comet-ml\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Building reliable machine learning pipelines with AWS Sagemaker and Comet"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/85aa446f8be987e848ea929ef524b67b","name":"Matt Peternell","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/da003ee51bbeeccfb95147ec69139879","url":"https:\/\/secure.gravatar.com\/avatar\/36058153d701caaf237a96d5d6fb9c2d1678325c3ed0d8e88bf5e487019a2a53?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/36058153d701caaf237a96d5d6fb9c2d1678325c3ed0d8e88bf5e487019a2a53?s=96&d=mm&r=g","caption":"Matt Peternell"},"description":"We re-implemented the architecture of this model to incorporate patient and study information. By comparing our updated model to the original Github repository, we were able to quantify the benefits of classifying by patient as opposed to classifying by individual X-ray. We observed a 0.0254 increase in AUROC when evaluating the DenseNet121 on patients instead of on individual scans.","sameAs":["http:\/\/atre.net"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/mpeternellatre-net\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/3540","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=3540"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/3540\/revisions"}],"predecessor-version":[{"id":15706,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/3540\/revisions\/15706"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/3545"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=3540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=3540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=3540"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=3540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}