{"id":1661,"date":"2019-05-24T21:25:53","date_gmt":"2019-05-25T05:25:53","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/blog\/organizing-machine-learning-projects-project-management-guidelines\/"},"modified":"2025-05-08T10:08:13","modified_gmt":"2025-05-08T10:08:13","slug":"organizing-machine-learning-projects-project-management-guidelines","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/","title":{"rendered":"Organizing Machine Learning Projects: Project Management Guidelines"},"content":{"rendered":"\n<p>&nbsp;<\/p>\n\n\n\n<p><strong>Author: Jeremy Jordan<\/strong><\/p>\n\n\n\n<p><em>Originally published at&nbsp;<\/em><a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>https:\/\/www.jeremyjordan.me<\/em><\/a><em>&nbsp;on September 1, 2018, and was updated recently to reflect new resources.<\/em><\/p>\n\n\n\n<p><em>The goal of this document is to provide a common framework for approaching machine learning projects that can be referenced by practitioners. If you build ML models, this post is for you. If you collaborate with people who build ML models, I hope that this guide provides you with a good perspective on the common project workflow. Knowledge of machine learning is assumed.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Overview<\/h2>\n\n\n\n<p>This overview intends to serve as a project \u201cchecklist\u201d for machine learning practitioners. Subsequent sections will provide more detail.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Project lifecycle<\/strong><\/h4>\n\n\n\n<p>Machine learning projects are highly iterative; as you progress through the ML lifecycle, you\u2019ll find yourself iterating on a section until reaching a satisfactory level of performance, then proceeding forward to the next task (which may be circling back to an even earlier step). Moreover, a project isn\u2019t complete after you ship the first version; you get feedback from real-world interactions and redefine the goals for the next iteration of deployment.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle-1-1024x1024-1-1024x1024.png\" alt=\"\" class=\"wp-image-1075\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">1. <strong><a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#planning\" target=\"_blank\" rel=\"noreferrer noopener\">Planning and project setup<\/a><\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define the task and scope out requirements<\/li>\n\n\n\n<li>Determine project feasibility<\/li>\n\n\n\n<li>Discuss general model tradeoffs (accuracy vs speed)<\/li>\n\n\n\n<li>Set up project codebase<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">2. <a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#data\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Data collection and labeling<\/strong><\/a><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define ground truth (create labeling documentation)<\/li>\n\n\n\n<li>Build data ingestion pipeline<\/li>\n\n\n\n<li>Validate quality of data<\/li>\n\n\n\n<li>Revisit Step 1 and ensure data is sufficient for the task<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">3. <a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#exploration\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Model exploration<\/strong><\/a><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Establish baselines for model performance<\/li>\n\n\n\n<li>Start with a simple model using initial data pipeline<\/li>\n\n\n\n<li>Overfit simple model to training data<\/li>\n\n\n\n<li>Stay nimble and try many parallel (isolated) ideas during early stages<\/li>\n\n\n\n<li>Find SoTA model for your problem domain (if available) and reproduce results, then apply to your dataset as a second baseline<\/li>\n\n\n\n<li>Revisit Step 1 and ensure feasibility<\/li>\n\n\n\n<li>Revisit Step 2 and ensure data quality is sufficient<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">4. <a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#refinement\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Model refinement<\/strong><\/a><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Perform model-specific optimizations (ie. hyperparameter tuning)<\/li>\n\n\n\n<li>Iteratively debug model as complexity is added<\/li>\n\n\n\n<li>Perform error analysis to uncover common failure modes<\/li>\n\n\n\n<li>Revisit Step 2 for targeted data collection of observed failures<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">5. <a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#testing\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Testing and evaluation<\/strong><\/a><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Evaluate model on test distribution; understand differences between train and test set distributions (how is \u201cdata in the wild\u201d different than what you trained on)<\/li>\n\n\n\n<li>Revisit model evaluation metric; ensure that this metric drives desirable downstream user behavior<\/li>\n\n\n\n<li>Write tests for: input data pipeline, model inference functionality, model inference performance on validation data, explicit scenarios expected in production (model is evaluated on a curated set of observations)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">6. <a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#deployment\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Model deployment<\/strong><\/a><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Expose model via a REST API<\/li>\n\n\n\n<li>Deploy new model to small subset of users to ensure everything goes smoothly, then roll out to all users<\/li>\n\n\n\n<li>Maintain the ability to roll back model to previous versions<\/li>\n\n\n\n<li>Monitor live data and model prediction distributions<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">7. <a href=\"https:\/\/www.jeremyjordan.me\/ml-projects-guide\/#maintenance\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Ongoing model maintenance<\/strong><\/a><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Understand that changes can affect the system in unexpected ways<\/li>\n\n\n\n<li>Periodically retrain model to prevent model staleness<\/li>\n\n\n\n<li>If there is a transfer in model ownership, educate the new team<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Team roles<\/strong><\/h4>\n\n\n\n<p>A typical team is composed of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>data engineer<\/strong>\u00a0(builds the data ingestion pipelines)<\/li>\n\n\n\n<li><strong>machine learning engineer<\/strong>\u00a0(train and iterate models to perform the task)<\/li>\n\n\n\n<li><strong>software engineer<\/strong>\u00a0(aids with integrating machine learning model with the rest of the product)<\/li>\n\n\n\n<li><strong>project manager<\/strong>\u00a0(main point of contact with the client)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-dots\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Planning and project setup<\/h2>\n\n\n\n<p>It may be tempting to skip this section and dive right in to \u201cjust see what the models can do\u201d. Don\u2019t skip this section. All too often, you\u2019ll end up wasting time by delaying discussions surrounding the project goals and model evaluation criteria. Everyone should be working toward a common goal from the start of the project.<\/p>\n\n\n\n<p>It\u2019s worth noting that defining the model task is not always straightforward. There\u2019s often many different approaches you can take towards solving a problem and it\u2019s not always immediately evident which is optimal. I\u2019ll write a follow-up blog post with more detailed advice on developing the&nbsp;<em>requirements<\/em>&nbsp;for a machine learning project.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prioritizing projects<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Ideal: project has high impact and high feasibility.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>Mental models for evaluating project impact:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Look for places where cheap prediction drives large value<\/li>\n\n\n\n<li>Look for complicated rule-based software where we can learn rules instead of programming them<\/li>\n<\/ul>\n\n\n\n<p>When evaluating projects, it can be useful to have a common language and understanding of the differences between traditional software and machine learning software. Andrej Karparthy\u2019s&nbsp;<a href=\"https:\/\/medium.com\/@karpathy\/software-2-0-a64152b37c35\" target=\"_blank\" rel=\"noreferrer noopener\">Software 2.0<\/a>&nbsp;is recommended reading for this topic.<\/p>\n\n\n\n<p><strong>Software 1.0<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Explicit instructions for a computer written by a programmer using a\u00a0<em>programming language<\/em>\u00a0such as Python or C++. A human writes the logic such that when the system is provided with data it will output the desired behavior.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/software-1-1024x359-1.jpg\" alt=\"\" class=\"wp-image-1074\"\/><\/figure>\n\n\n\n<p><strong>Software 2.0<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Implicit instructions by providing data, \u201cwritten\u201d by an optimization algorithm using\u00a0<em>parameters<\/em>\u00a0of a specified model architecture. The system logic is learned from a provided collection of data examples and their corresponding desired behavior.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/software-2-1024x361-1.jpg\" alt=\"\" class=\"wp-image-1073\"\/><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>See\u00a0<\/em><a href=\"https:\/\/www.youtube.com\/watch?v=zywIvINSlaI\" target=\"_blank\" rel=\"noreferrer noopener\">this talk<\/a><em>\u00a0for more detail.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>A quick note on Software 1.0 and Software 2.0 \u2014 these two paradigms are&nbsp;<strong><em>not<\/em><\/strong>&nbsp;mutually exclusive. Software 2.0 is usually used to scale the&nbsp;<strong>logic<\/strong>&nbsp;component of traditional software systems by leveraging large amounts of data to enable more complex or nuanced decision logic.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/software-1-2-1024x413-1.jpg\" alt=\"\" class=\"wp-image-1072\"\/><\/figure>\n\n\n\n<p>For example,&nbsp;<a href=\"https:\/\/twimlai.com\/twiml-talk-124-systems-software-machine-learning-scale-jeff-dean\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jeff Dean talks<\/a>&nbsp;(at 27:15) about how the code for Google Translate used to be a very complicated system consisting of ~500k lines of code. Google was able to simplify this product by leveraging a machine learning model to perform the core logical task of translating text to a different language, requiring only ~500 lines of code to describe the model. However, this model still requires some \u201cSoftware 1.0\u201d code to process the user\u2019s query, invoke the machine learning model, and return the desired information to the user.<\/p>\n\n\n\n<p>In summary, machine learning can drive large value in applications where decision logic is difficult or complicated for humans to write, but relatively easy for machines to learn. On that note, we\u2019ll continue to the next section to discuss how to evaluate whether a task is \u201crelatively easy\u201d for machines to learn.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Determining feasibility<\/h2>\n\n\n\n<p>Some useful questions to ask when determining the feasibility of a project:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cost of data acquisition<\/strong><\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8211; How hard is it to acquire data?<\/p>\n\n\n\n<p><br>&#8211; How expensive is data labeling?<\/p>\n\n\n\n<p>&#8211; How much data will be needed?<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cost of wrong predictions<\/strong><\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>&#8211; How frequently does the system need to be right to be useful?<\/em><\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Availability of good published work about similar problems<\/strong><\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8211; Has the problem been reduced to practice?<\/p>\n\n\n\n<p><br>&#8211; Is there sufficient literature on the problem?<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Computational resources available both for training and inference<\/strong><\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>&#8211; Will the model be deployed in a resource-constrained environment?<\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Specifying project requirements<\/h2>\n\n\n\n<p>Establish a single value optimization metric for the project. Can also include several other&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Satisficing\" target=\"_blank\" rel=\"noreferrer noopener\">satisficing<\/a>&nbsp;metrics (ie. performance thresholds) to evaluate models, but can only&nbsp;<strong><em>optimize<\/em><\/strong>&nbsp;a single metric.<\/p>\n\n\n\n<p><em>Examples:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Optimize for accuracy<\/li>\n\n\n\n<li>Prediction latency under 10 ms<\/li>\n\n\n\n<li>Model requires no more than 1gb of memory<\/li>\n\n\n\n<li>90% coverage (model confidence exceeds required threshold to consider a prediction as valid)<\/li>\n<\/ul>\n\n\n\n<p>The optimization metric may be a weighted sum of many things which we care about. Revisit this metric as performance improves.<\/p>\n\n\n\n<p>Some teams may choose to ignore a certain requirement at the start of the project, with the goal of revising their solution (to meet the ignored requirements) after they have discovered a promising general approach.<\/p>\n\n\n\n<p>Decide at what point you will ship your first model.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Some teams aim for a \u201cneutral\u201d first launch: a first launch that explicitly deprioritizes machine learning gains, to avoid getting distracted. \u2014\u00a0<\/em><a href=\"http:\/\/martin.zinkevich.org\/rules_of_ml\/rules_of_ml.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Google Rules of Machine Learning<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>The motivation behind this approach is that the first deployment should involve a simple model with focus spent on building the proper machine learning pipeline required for prediction. This allows you to deliver value quickly and avoid the trap of spending too much of your time trying to&nbsp;<a href=\"http:\/\/karpathy.github.io\/2019\/04\/25\/recipe\/#6-squeeze-out-the-juice\" target=\"_blank\" rel=\"noreferrer noopener\">\u201csqueeze the juice.\u201d<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up a ML codebase<\/h2>\n\n\n\n<p>A well-organized machine learning codebase should modularize data processing, model definition, model training, and experiment management.<\/p>\n\n\n\n<p>Example codebase organization:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>|\u2014\u2014 data\/              &lt;- raw and processed data for your project\n|     |\u2014\u2014README.md     &lt;- describes the data for the project\n|\n|\n|\u2014\u2014docker\/             &lt;- specify one or many dockerfiles\n|     |\u2014\u2014dockerfile  &lt;- Docker helps ensure consistent behavior\n|                         across multiple machines\/deployments\n|\n|\n|\u2014\u2014api\/\n|     |\u2013\u2014app.py        &lt;- exposes model through REST client for\n|                         predictions\n|\n|\n|\u2014\u2014 project_name\/\n|     |\u2014\u2014 networks\/    &lt;- defines neural network architectures used\n|     |     |\u2014\u2014resnet.py\n|     |     |\u2014\u2013densenet.py\n|     |\u2014\u2014 models\/      &lt;- handles everything else needed w\/ network\n|     |      |\u2014\u2014base.py   including data preprocessing and output\n|     |      |\u2014\u2014simple_baseline.py                  normalization\n|     |      |\u2014\u2014cnn.py\n|     |\u2014\u2014configs\/\n|     |      |\u2014\u2014baseline.yaml\n|     |      |\u2014\u2014latest.yaml\n|     |\u2014\u2014datasets.py   &lt;- manages construction of the dataset\n|     |\u2014\u2014training.py   &lt;- defines actual training loop for the model\n|     |\u2014\u2014experiment.py &lt;- manages experiment process of evaluating\n|                         multiple models\/ideas. Constructs the\n|                         dataset\/model\n|\u2014\u2014scripts\/<\/code><\/pre>\n\n\n\n<p><code>networks\/<\/code>&nbsp;defines the neural network architectures used. Only the computational graph is defined, these objects are agnostic to the input and output shapes, model losses, and training methodology.<\/p>\n\n\n\n<p><code>datasets.py<\/code>&nbsp;manages construction of the dataset. Handles data pipelining\/staging areas, shuffling, reading from disk.<\/p>\n\n\n\n<p><code>experiment.py<\/code>&nbsp;manages the experiment process of evaluating multiple models\/ideas. This constructs the dataset and model for a given experiment.<\/p>\n\n\n\n<p><code>training.py<\/code>&nbsp;defines the actual training loop for the model, which is called by an Experiment object. This code interacts with the optimizer and handles logging during training.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>See other examples\u00a0<\/em><a href=\"https:\/\/github.com\/cmawer\/reproducible-model\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a><em>\u00a0and\u00a0<\/em><a href=\"https:\/\/drivendata.github.io\/cookiecutter-data-science\/#directory-structure\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a><em>.<\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Data collection and labeling<\/h2>\n\n\n\n<p>An ideal machine learning pipeline uses data which labels itself. For example, Tesla Autopilot has a model running that predicts when cars are about to&nbsp;<a href=\"https:\/\/www.youtube.com\/watch?v=Ucp0TTmvqOE&amp;feature=youtu.be&amp;t=7809\" target=\"_blank\" rel=\"noreferrer noopener\">cut into your lane<\/a>. In order to acquire labeled data in a systematic manner, you can simply observe when a car changes from a neighboring lane into the Tesla\u2019s lane and then rewind the video feed to label that a car is about to cut in to the lane.<\/p>\n\n\n\n<p>As another example, suppose Facebook is building a model to predict user engagement when deciding how to order things on the newsfeed. After serving the user content based on a prediction, they can monitor engagement and turn this interaction into a labeled observation without any human effort. However, just be sure to think through this process and ensure that your \u201cself-labeling\u201d system won\u2019t get stuck in a&nbsp;<em>feedback loop<\/em>&nbsp;with itself.<\/p>\n\n\n\n<p>For many other cases, we must manually label data for the task we wish to automate. The quality of your data labels has a&nbsp;<em>large<\/em>&nbsp;effect on the upper bound of model performance.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/twitter.com\/alex_gude\/status\/1121138827601383426?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwgr%5E363937393b70726f64756374696f6e&#038;ref_url=https%3A%2F%2Fcdn.embedly.com%2Fwidgets%2Fmedia.html%3Ftype%3Dtext%252Fhtml%26key%3Da19fcc184b9711e1b4764040d3dc5c07%26schema%3Dtwitter%26url%3Dhttps%253A%2F%2Ftwitter.com%2Falex_gude%2Fstatus%2F1121138827601383426%26image%3Dhttps%253A%2F%2Fi.embed.ly%2F1%2Fimage%253Furl%253Dhttps%25253A%25252F%25252Fpbs.twimg.com%25252Fprofile_images%25252F745090106722156544%25252Ft16yGQfo_400x400.jpg%2526key%253Da19fcc184b9711e1b4764040d3dc5c07\n<\/div><\/figure>\n\n\n\n<p>Most data labeling projects require multiple people, which necessitates labeling&nbsp;<strong>documentation<\/strong>. Even if you\u2019re the only person labeling the data, it makes sense to document your labeling criteria so that you maintain consistency.<\/p>\n\n\n\n<p>One tricky case is where you decide to change your labeling methodology after already having labeled data. For example, in the Software 2.0 talk mentioned previously, Andrej Karparthy&nbsp;<a href=\"https:\/\/www.youtube.com\/watch?v=zywIvINSlaI&amp;feature=youtu.be&amp;t=20m43s\" target=\"_blank\" rel=\"noreferrer noopener\">talks about<\/a>&nbsp;data which has no clear and obvious ground truth.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/data-labeling-1024x573-1.jpg\" alt=\"\" class=\"wp-image-1071\"\/><\/figure>\n\n\n\n<p>If you run into this,&nbsp;<em>tag \u201chard-to-label\u201d examples<\/em>&nbsp;in some manner such that you can easily find all similar examples should you decide to change your labeling methodology down the road. Additionally, you should&nbsp;<strong>version your dataset<\/strong>&nbsp;and associate a given model with a dataset version.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Tip: After labeling data and training an initial model, look at the observations with the largest error. These examples are often poorly labeled.<\/em><\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Active Learning<\/strong><\/h4>\n\n\n\n<p>Useful when you have a large amount of unlabeled data, need to decide what data you should label. Labeling data can be expensive, so we\u2019d like to limit the time spent on this task.<\/p>\n\n\n\n<p><em>As a counterpoint, if you can afford to label your entire dataset, you probably should. Active learning adds another layer of complexity.<\/em><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cThe main hypothesis in active learning is that if a learning algorithm can choose the data it wants to learn from, it can perform better than traditional methods with substantially less data for training.\u201d \u2014\u00a0<\/em><a href=\"https:\/\/www.datacamp.com\/community\/tutorials\/active-learning\" target=\"_blank\" rel=\"noreferrer noopener\">DataCamp<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>General approach:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Starting with an unlabeled dataset, build a \u201cseed\u201d dataset by acquiring labels for a small subset of instances<\/li>\n\n\n\n<li>Train initial model on the seed dataset<\/li>\n\n\n\n<li>Predict the labels of the remaining unlabeled observations<\/li>\n\n\n\n<li>Use the uncertainty of the model\u2019s predictions to prioritize the labeling of remaining observations<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Leveraging weak labels<\/strong><\/h4>\n\n\n\n<p>However, tasking humans with generating ground truth labels is expensive. Often times you\u2019ll have access to large swaths of unlabeled data and a limited labeling budget \u2014 how can you maximize the value from your data? In some cases, your data can have information which provides a noisy estimate of the ground truth. For example,&nbsp;<a href=\"https:\/\/code.fb.com\/ml-applications\/advancing-state-of-the-art-image-recognition-with-deep-learning-on-hashtags\/\" target=\"_blank\" rel=\"noreferrer noopener\">if you\u2019re categorizing Instagram photos, you might have access to the hashtags used in the caption of the image<\/a>. Other times, you might have subject matter experts which can help you develop heuristics about the data.<\/p>\n\n\n\n<p><a href=\"https:\/\/hazyresearch.github.io\/snorkel\/\" target=\"_blank\" rel=\"noreferrer noopener\">Snorkel<\/a>&nbsp;is an interesting project produced by the Stanford DAWN (Data Analytics for What\u2019s Next) lab which formalizes an approach towards combining many noisy label estimates into a probabilistic ground truth. I\u2019d encourage you to check it out and see if you might be able to leverage the approach for your problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Model exploration<\/h2>\n\n\n\n<p><strong>Establish performance baselines on your problem.<\/strong>&nbsp;Baselines are useful for both establishing a lower bound of expected performance (simple model baseline) and establishing a target performance level (human baseline).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Simple baselines include out-of-the-box scikit-learn models (i.e. logistic regression with default parameters) or even simple heuristics (always predict the majority class). Without these baselines, it\u2019s impossible to evaluate the value of added model complexity.<\/li>\n\n\n\n<li>If your problem is well-studied, search the literature to approximate a baseline based on published results for very similar tasks\/datasets.<\/li>\n\n\n\n<li>If possible, try to estimate human-level performance on the given task. Don\u2019t naively assume that humans will perform the task perfectly, a lot of simple tasks are\u00a0<a href=\"http:\/\/karpathy.github.io\/2014\/09\/02\/what-i-learned-from-competing-against-a-convnet-on-imagenet\/\" target=\"_blank\" rel=\"noreferrer noopener\">deceptively hard<\/a>!<\/li>\n<\/ul>\n\n\n\n<p><strong>Start simple and gradually ramp up complexity.<\/strong>&nbsp;This typically involves using a simple model, but can also include starting with a simpler version of your task.<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">Before doing anything intelligent with &quot;AI&quot;, do the unintelligent version fast and at scale.<br>At worst you understand the limits of a simplistic approach and what complexities you need to handle.<br>At best you realize you don&#39;t need the overhead of intelligence.<\/p>&mdash; Smerity (@Smerity) <a href=\"https:\/\/twitter.com\/Smerity\/status\/1095490777860304896?ref_src=twsrc%5Etfw\">February 13, 2019<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><\/figure>\n\n\n\n<p><strong>Once a model runs, overfit a single batch of data.<\/strong>&nbsp;Don\u2019t use regularization yet, as we want to see if the unconstrained model has sufficient capacity to learn from the data.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/pcc.cs.byu.edu\/2017\/10\/02\/practical-advice-for-building-deep-neural-networks\/\" target=\"_blank\" rel=\"noreferrer noopener\">Practical Advice for Building Deep Neural Networks<\/a>\u00a0(see case study on overfitting an initial model)<\/li>\n<\/ul>\n\n\n\n<p><strong>Survey the literature.<\/strong>&nbsp;Search for papers on Arxiv describing model architectures for similar problems and speak with other practitioners to see which approaches have been most successful in practice. Determine a&nbsp;<em>state of the art<\/em>&nbsp;approach and use this as a baseline model (trained on your dataset).<\/p>\n\n\n\n<p><strong>Reproduce a known result.<\/strong>&nbsp;If you\u2019re using a model which has been well-studied, ensure that your model\u2019s performance&nbsp;<em>on a commonly-used dataset<\/em>&nbsp;matches what is reported in the literature.<\/p>\n\n\n\n<p><strong>Understand how model performance scales with more data.<\/strong>&nbsp;Plot the model performance as a function of increasing dataset size for the baseline models that you\u2019ve explored. Observe how each model\u2019s performance scales as you increase the amount of data used for training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Model refinement<\/h2>\n\n\n\n<p>Once you have a general idea of successful model architectures and approaches for your problem, you should now spend much more focused effort on squeezing out performance gains from the model.<\/p>\n\n\n\n<p><strong>Build a scalable data pipeline.<\/strong>&nbsp;By this point, you\u2019ve determined which types of data are necessary for your model and you can now focus on engineering a performant pipeline.<\/p>\n\n\n\n<p><strong>Apply the bias variance decomposition to determine next steps.<\/strong>&nbsp;Break down error into: irreducible error, avoidable bias (difference between train error and irreducible error), variance (difference between validation error and train error), and validation set overfitting (difference between test error and validation error).<\/p>\n\n\n\n<p>If training on a (known) different distribution than what is available at test time, consider having&nbsp;<em>two validation subsets<\/em>: val-train and val-test. The difference between val-train error and val-test error is described by distribution shift.<\/p>\n\n\n\n<p><strong><em>Addressing underfitting<\/em>:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Increase model capacity<\/li>\n\n\n\n<li>Reduce regularization<\/li>\n\n\n\n<li>Error analysis<\/li>\n\n\n\n<li>Choose a more advanced architecture (closer to state of art)<\/li>\n\n\n\n<li>Tune hyperparameters<\/li>\n\n\n\n<li>Add features<\/li>\n<\/ol>\n\n\n\n<p><strong><em>Addressing overfitting<\/em>:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Add more training data<\/li>\n\n\n\n<li>Add regularization<\/li>\n\n\n\n<li>Add data augmentation<\/li>\n\n\n\n<li>Error analysis<\/li>\n\n\n\n<li>Tune hyperparameters<\/li>\n\n\n\n<li>Reduce model size<\/li>\n<\/ol>\n\n\n\n<p><strong><em>Addressing distribution shift<\/em>:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Perform error analysis to understand nature of distribution shift<\/li>\n\n\n\n<li>Synthesize data (by augmentation) to more closely match the test distribution<\/li>\n\n\n\n<li>Apply domain adaptation techniques<\/li>\n<\/ol>\n\n\n\n<p><strong>Use coarse-to-fine random searches for hyperparameters.<\/strong>&nbsp;Start with a wide hyperparameter space initially and iteratively hone in on the highest-performing region of the hyperparameter space.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>See\u00a0<a href=\"https:\/\/www.jeremyjordan.me\/hyperparameter-tuning\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hyperparameter<\/a>\u00a0tuning for machine learning models.<\/li>\n<\/ul>\n\n\n\n<p><strong>Perform targeted collection of data to address current failure modes.<\/strong>&nbsp;Develop a systematic method for analyzing errors of your current model. Categorize these errors, if possible, and collect additional data to better cover these cases.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Debugging ML projects<\/h2>\n\n\n\n<p>Why is your model performing poorly?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Implementation bugs<\/li>\n\n\n\n<li>Hyperparameter choices<\/li>\n\n\n\n<li>Data\/model fit<\/li>\n\n\n\n<li>Dataset construction<\/li>\n<\/ul>\n\n\n\n<p><em>Key mindset for DL troubleshooting: pessimism.<\/em><\/p>\n\n\n\n<p>In order to complete machine learning projects efficiently,&nbsp;<strong><em>start simple<\/em><\/strong>&nbsp;and gradually increase complexity. Start with a solid foundation and build upon it in an incremental fashion.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Tip: Fix a <a href=\"https:\/\/towardsdatascience.com\/properly-setting-the-random-seed-in-machine-learning-experiments-7da298d1320b\">random<\/a> <a href=\"https:\/\/pytorch.org\/docs\/stable\/notes\/randomness.html\">seed<\/a> to ensure your model training is reproducible.<\/p>\n<\/blockquote>\n\n\n\n<p>Common bugs:<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">most common neural net mistakes: 1) you didn&#39;t try to overfit a single batch first. 2) you forgot to toggle train\/eval mode for the net. 3) you forgot to .zero_grad() (in pytorch) before .backward(). 4) you passed softmaxed outputs to a loss that expects raw logits. ; others? \ud83d\ude42<\/p>&mdash; Andrej Karpathy (@karpathy) <a href=\"https:\/\/twitter.com\/karpathy\/status\/1013244313327681536?ref_src=twsrc%5Etfw\">July 1, 2018<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">oh: 5) you didn&#39;t use bias=False for your Linear\/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won&#39;t make you silently fail, but they are spurious parameters<\/p>&mdash; Andrej Karpathy (@karpathy) <a href=\"https:\/\/twitter.com\/karpathy\/status\/1013245864570073090?ref_src=twsrc%5Etfw\">July 1, 2018<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Discovering failure modes<\/h3>\n\n\n\n<p>Use clustering to uncover failure modes and improve error analysis:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Select all incorrect predictions.<\/li>\n\n\n\n<li>Run a clustering algorithm such as DBSCAN across selected observations.<\/li>\n\n\n\n<li>Manually explore the clusters to look for common attributes which make prediction difficult.<\/li>\n<\/ul>\n\n\n\n<p>Categorize observations with incorrect predictions and determine what best action can be taken in the model refinement stage in order to improve performance on these cases.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Testing and evaluation<\/h3>\n\n\n\n<p>Different components of a ML product:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Training system<\/strong>\u00a0processes raw data, runs experiments, manages results, stores weights.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Required tests:<\/p>\n\n\n\n<p><br>&#8211; Test the full training pipeline (from raw data to trained model) to ensure that changes haven\u2019t been made upstream with respect to how data from our application is stored. These tests should be run nightly\/weekly.<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Prediction system<\/strong>\u00a0constructs the network, loads the stored weights, and makes predictions.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Required tests:<\/p>\n\n\n\n<p><br>&#8211; Run inference on the validation data (already processed) and ensure model score does not degrade with new model\/weights. This should be triggered every code push.<\/p>\n\n\n\n<p><br>&#8211; You should also have a quick functionality test that runs on a few important examples so that you can quickly (&lt;5 minutes) ensure that you haven\u2019t broken functionality during development. These tests are used as a sanity check as you are writing new code.<\/p>\n\n\n\n<p><br>&#8211; Also consider scenarios that your model might encounter, and develop tests to ensure new models still perform sufficiently. The \u201ctest case\u201d is a scenario defined by the human and represented by a curated set of observations.<\/p>\n\n\n\n<p><br>(Example: For a self driving car, you might have a test to ensure that the care doesn\u2019t turn left at a yellow light. For this case, you may run your model on observations where the car is at a yellow light and ensure that the prediction doesn\u2019t tell the car to proceed forward.)<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Serving system<\/strong>\u00a0exposed to accept \u201creal world\u201d input and perform inference on production data. This system must be able to scale to demand.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Required monitoring:<\/p>\n\n\n\n<p><br>&#8211; Alerts for downtime and errors<\/p>\n\n\n\n<p><br>&#8211; Check for distribution shift in data<\/p>\n<\/blockquote>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/system-testing.png\" alt=\"\" class=\"wp-image-1070\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Evaluating production readiness<\/h2>\n\n\n\n<p><a href=\"https:\/\/ai.google\/research\/pubs\/pub46555\" target=\"_blank\" rel=\"noreferrer noopener\">The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction<\/a><\/p>\n\n\n\n<p><em>Data:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Feature expectations are captured in a schema.<\/li>\n\n\n\n<li>All features are beneficial.<\/li>\n\n\n\n<li>No feature\u2019s cost is too much.<\/li>\n\n\n\n<li>Features adhere to meta-level requirements.<\/li>\n\n\n\n<li>The data pipeline has appropriate privacy controls.<\/li>\n\n\n\n<li>New features can be added quickly.<\/li>\n\n\n\n<li>All input feature code is tested.<\/li>\n<\/ul>\n\n\n\n<p><em>Model:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Model specs are reviewed and submitted.<\/li>\n\n\n\n<li>Offline and online metrics correlate.<\/li>\n\n\n\n<li>All hyperparameters have been tuned.<\/li>\n\n\n\n<li>The impact of model staleness is known.<\/li>\n\n\n\n<li>A simple model is not better.<\/li>\n\n\n\n<li>Model quality is sufficient on important data slices.<\/li>\n\n\n\n<li>The model is tested for considerations of inclusion.<\/li>\n<\/ul>\n\n\n\n<p><em>Infrastructure:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Training is reproducible.<\/li>\n\n\n\n<li>Model specs are unit tested.<\/li>\n\n\n\n<li>The ML pipeline is integration tested.<\/li>\n\n\n\n<li>Model quality is validated before serving.<\/li>\n\n\n\n<li>The model is debuggable.<\/li>\n\n\n\n<li>Models are canaried before serving.<\/li>\n\n\n\n<li>Serving models can be rolled back.<\/li>\n<\/ul>\n\n\n\n<p><em>Monitoring:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dependency changes result in notification.<\/li>\n\n\n\n<li>Data invariants hold for inputs.<\/li>\n\n\n\n<li>Training and serving are not skewed.<\/li>\n\n\n\n<li>Models are not too stale.<\/li>\n\n\n\n<li>Models are numerically stable.<\/li>\n\n\n\n<li>Computing performance has not regressed.<\/li>\n\n\n\n<li>Prediction quality has not regressed.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Model deployment<\/h2>\n\n\n\n<p>Be sure to have a versioning system in place for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Model parameters<\/li>\n\n\n\n<li>Model configuration<\/li>\n\n\n\n<li>Feature pipeline<\/li>\n\n\n\n<li>Training dataset<\/li>\n\n\n\n<li>Validation dataset<\/li>\n<\/ul>\n\n\n\n<p>A common way to deploy a model is to package the system into a Docker container and expose a REST API for inference.<\/p>\n\n\n\n<p><strong>Canarying<\/strong>: Serve new model to a small subset of users (ie. 5%) while still serving the existing model to the remainder. Check to make sure rollout is smooth, then deploy new model to rest of users.<\/p>\n\n\n\n<p><strong>Shadow mode:<\/strong>&nbsp;Ship a new model alongside the existing model, still using the existing model for predictions but storing the output for both models. Measuring the delta between the new and current model\u2019s predictions will give an indication for how drastically things will change when you switch to the new model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ongoing model maintenance<\/h2>\n\n\n\n<p><a href=\"https:\/\/papers.nips.cc\/paper\/5656-hidden-technical-debt-in-machine-learning-systems.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Hidden Technical Debt in Machine Learning Systems<\/a>&nbsp;(quoted below, emphasis mine)<\/p>\n\n\n\n<p>A primer on concept of technical debt:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>As with fiscal debt, there are often sound strategic reasons to take on technical debt.\u00a0<\/em><strong>Not all debt is bad, but all debt needs to be serviced.<\/strong><em>\u00a0Technical debt may be paid down by refactoring code, improving unit tests, deleting dead code, reducing dependencies, tightening APIs, and improving documentation. The goal is not to add new functionality, but to enable future improvements, reduce errors, and improve maintainability.\u00a0<\/em><strong>Deferring such payments results in compounding costs.<\/strong><em>\u00a0Hidden debt is dangerous because it compounds silently.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>Machine learning projects are not complete upon shipping the first version. If you are \u201chanding off\u201d a project and transferring model responsibility, it is extremely important to talk through the required model maintenance with the new team.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive.<\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Maintenance principles<\/h2>\n\n\n\n<p><strong>CACE principle: Changing Anything Changes Everything<\/strong><br>Machine learning systems are tightly coupled. Changes to the feature space, hyper parameters, learning rate, or any other \u201cknob\u201d can affect model performance.<\/p>\n\n\n\n<p><em>Specific mitigation strategies:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create model validation tests which are run every time new code is pushed.<\/li>\n\n\n\n<li>Decompose problems into\u00a0<em>isolated<\/em>\u00a0components where it makes sense to do so.<\/li>\n<\/ul>\n\n\n\n<p><strong>Undeclared consumers of your model may be inadvertently affected by your changes.<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cWithout access controls, it is possible for some of these consumers to be undeclared consumers, consuming the output of a given prediction model as an input to another component of the system.\u201d<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>If your model and\/or its predictions are widely accessible, other components within your system may grow to depend on your model without your knowledge. Changes to the model (such as periodic retraining or redefining the output) may negatively affect those downstream components.<\/p>\n\n\n\n<p><em>Specific mitigation strategies:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Control access to your model by making outside components request permission and signal their usage of your model.<\/li>\n<\/ul>\n\n\n\n<p><strong>Avoid depending on input signals which may change over time.<\/strong><br>Some features are obtained by a table lookup (ie. word embeddings) or simply an input pipeline which is outside the scope of your codebase. When these external feature representations are changed, the model\u2019s performance can suffer.<\/p>\n\n\n\n<p><em>Specific mitigation strategies:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create a versioned copy of your input signals to provide stability against changes in external input pipelines. These versioned inputs can be specified in a model\u2019s configuration file.<\/li>\n<\/ul>\n\n\n\n<p><strong>Eliminate unnecessary features.<\/strong><br>Regularly evaluate the effect of removing individual features from a given model. A model\u2019s feature space should only contain relevant and important features for the given task.<\/p>\n\n\n\n<p>There are many strategies to determine feature importances, such as leave-one-out cross validation and feature permutation tests. Unimportant features add noise to your feature space and should be removed.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Tip: Document deprecated features (deemed unimportant) so that they aren\u2019t accidentally reintroduced later.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p><strong>Model performance will likely decline over time.<\/strong><br>As the input distribution shifts, the model\u2019s performance will suffer. You should plan to periodically retrain your model such that it has always learned from recent \u201creal world\u201d data.<\/p>\n\n\n\n<p>This guide draws inspiration from the&nbsp;<a href=\"https:\/\/fullstackdeeplearning.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Full Stack Deep Learning Bootcamp<\/a>,&nbsp;<a href=\"http:\/\/martin.zinkevich.org\/rules_of_ml\/rules_of_ml.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">best practices released by Google<\/a>, personal experience, and conversations with fellow practitioners.<\/p>\n\n\n\n<p>Find something that\u2019s missing from this guide? Let us know in the comments below!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">External Resources<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/karpathy.github.io\/2019\/04\/25\/recipe\/\" target=\"_blank\" rel=\"noreferrer noopener\">A Recipe for Training Neural Networks<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=7D8unG3XMzU\" target=\"_blank\" rel=\"noreferrer noopener\">An Only One Step Ahead Guide for Machine Learning Projects \u2014 Chang Lee<\/a>.\u00a0<em>This is an entertaining talk discussing advice for approaching machine learning projects. This talk will give you a \u201cflavor\u201d for the details covered in this guide.<\/em><\/li>\n\n\n\n<li><a href=\"https:\/\/medium.com\/@Ben_Reinhardt\/designing-collaborative-ai-5c1e8dbc8810\" target=\"_blank\" rel=\"noreferrer noopener\">Designing collaborative AI<\/a>\u00a0(clever product design can reduce model performance requirements)<\/li>\n\n\n\n<li><a href=\"http:\/\/burrsettles.com\/pub\/settles.activelearning.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Active Learning Literature Survey<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/becominghuman.ai\/accelerate-machine-learning-with-active-learning-96cea4b72fdb\" target=\"_blank\" rel=\"noreferrer noopener\">Accelerate Machine Learning with Active Learning<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=FE1r7_SQq6Y\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Research: Active Learning and Annotation<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/ai.stanford.edu\/blog\/weak-supervision\/\" target=\"_blank\" rel=\"noreferrer noopener\">Weak Supervision: A New Programming Paradigm for Machine Learning<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.pyimagesearch.com\/2018\/01\/29\/scalable-keras-deep-learning-rest-api\/\" target=\"_blank\" rel=\"noreferrer noopener\">A scalable Keras + deep learning REST API<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/josh-tobin.com\/troubleshooting-deep-neural-networks.html\" target=\"_blank\" rel=\"noreferrer noopener\">Troubleshooting Deep Neural Networks<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/towardsdatascience.com\/checklist-for-debugging-neural-networks-d8b2a9434f21\" target=\"_blank\" rel=\"noreferrer noopener\">Checklist for debugging neural networks<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/towardsdatascience.com\/properly-setting-the-random-seed-in-machine-learning-experiments-7da298d1320b\" target=\"_blank\" rel=\"noreferrer noopener\">Properly Setting the Random Seed in Machine Learning Experiments<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Author: Jeremy Jordan Originally published at&nbsp;https:\/\/www.jeremyjordan.me&nbsp;on September 1, 2018, and was updated recently to reflect new resources. The goal of this document is to provide a common framework for approaching machine learning projects that can be referenced by practitioners. If you build ML models, this post is for you. If you collaborate with people [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1668,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[8,6,9],"tags":[],"coauthors":[107],"class_list":["post-1661","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-machine-learning","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Organizing Machine Learning Projects: Project Management Guidelines - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Organizing Machine Learning Projects: Project Management Guidelines\" \/>\n<meta property=\"og:description\" content=\"&nbsp; Author: Jeremy Jordan Originally published at&nbsp;https:\/\/www.jeremyjordan.me&nbsp;on September 1, 2018, and was updated recently to reflect new resources. The goal of this document is to provide a common framework for approaching machine learning projects that can be referenced by practitioners. If you build ML models, this post is for you. If you collaborate with people [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2019-05-25T05:25:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-08T10:08:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Gideon Mendels\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gideon Mendels\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"23 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Organizing Machine Learning Projects: Project Management Guidelines - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/","og_locale":"en_US","og_type":"article","og_title":"Organizing Machine Learning Projects: Project Management Guidelines","og_description":"&nbsp; Author: Jeremy Jordan Originally published at&nbsp;https:\/\/www.jeremyjordan.me&nbsp;on September 1, 2018, and was updated recently to reflect new resources. The goal of this document is to provide a common framework for approaching machine learning projects that can be referenced by practitioners. If you build ML models, this post is for you. If you collaborate with people [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2019-05-25T05:25:53+00:00","article_modified_time":"2025-05-08T10:08:13+00:00","og_image":[{"width":1200,"height":1200,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle.png","type":"image\/png"}],"author":"Gideon Mendels","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Gideon Mendels","Est. reading time":"23 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/"},"author":{"name":"engineering@atre.net","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b"},"headline":"Organizing Machine Learning Projects: Project Management Guidelines","datePublished":"2019-05-25T05:25:53+00:00","dateModified":"2025-05-08T10:08:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/"},"wordCount":4384,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle.png","articleSection":["Comet Community Hub","Machine Learning","Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/","url":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/","name":"Organizing Machine Learning Projects: Project Management Guidelines - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle.png","datePublished":"2019-05-25T05:25:53+00:00","dateModified":"2025-05-08T10:08:13+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/project-lifecycle.png","width":1200,"height":1200,"caption":"Machine learning development lifecycle"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/organizing-machine-learning-projects-project-management-guidelines\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Organizing Machine Learning Projects: Project Management Guidelines"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b","name":"engineering@atre.net","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/027c18177377edf459980f0cfb83706c","url":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","caption":"engineering@atre.net"},"sameAs":["https:\/\/live-cometml.pantheonsite.io"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/engineeringatre-net\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/1661","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=1661"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/1661\/revisions"}],"predecessor-version":[{"id":15868,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/1661\/revisions\/15868"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/1668"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=1661"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=1661"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=1661"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=1661"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}