{"id":2263,"date":"2020-10-20T13:19:29","date_gmt":"2020-10-20T21:19:29","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/blog\/industry-qa-where-most-machine-learning-projects-fail\/"},"modified":"2020-10-20T13:19:29","modified_gmt":"2020-10-20T21:19:29","slug":"industry-qa-where-most-machine-learning-projects-fail","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/","title":{"rendered":"Industry Q&#038;A: Where Most ML Projects Fail"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>Comet recently hosted the online panel, <a href=\"https:\/\/info.comet.ml\/panel-addressing-ml-challenges\/\">\u201cHow do top AI researchers from Google, Stanford and Hugging Face approach new ML problems?\u201d<\/a> This is the second post in a series where we recap the questions, answers, and approaches that top AI teams in the world are taking to critical machine learning challenges. You can access the <a href=\"https:\/\/www.comet.com\/site\/how-to-start-the-machine-learning-research-process\/\">first post here.<\/a><\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>We would like to thank <a href=\"https:\/\/twitter.com\/ambarish_jash?lang=en\">Ambarish Jash<\/a>, <a href=\"https:\/\/ai.google\/\">Google<\/a><\/em>; <em><a href=\"https:\/\/twitter.com\/w4nderlus7?lang=en\">Piero Molino<\/a>, <a href=\"https:\/\/ai.stanford.edu\/\">Stanford<\/a><\/em> + <em><a href=\"https:\/\/twitter.com\/ludwig_ai\">Ludwig<\/a><\/em>; <em>and <a href=\"https:\/\/twitter.com\/sanhestpasmoi?lang=en\">Victor Sanh<\/a>, <a href=\"https:\/\/huggingface.co\/\">Hugging Face<\/a><\/em>;<em> for their participation. <\/em><\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">https:\/\/vimeo.com\/470175032<\/div>\n<\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Although every machine learning project is different, there are common pitfalls and challenges that machine learning teams face when building and training models, and then taking them into production. Many of these challenges can be addressed when taken into consideration upfront, such as understanding the end goal, as well as the limitations that will be faced in your production environment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Gideon Mendels, Comet<\/strong><br \/>You all have a lot of experience. You&#8217;ve seen a lot of models in production, models that didn&#8217;t make it to production. Where do you see most machine learning projects fail? I say projects and not models because we\u2019re looking at how we bring value to the business or to the team. As a follow up, what would you tell a junior data scientist to be careful about? What is your number one tip for someone coming into the industry?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Piero Molino, Stanford &amp; Ludwig<\/strong><br \/>In terms of failures, I would say a couple the situations usually arise when you don&#8217;t know or understand what you&#8217;re optimizing for beforehand. Then when you try to deploy a model, the model is not really doing what you expect. You have to understand, \u201cWhat&#8217;s the final goal?\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One example I can give &#8212; for recommender systems, if you have your model that has a higher mean reciprocal rank, or whatever metric that you care about, but then you put it in the hands of the users and find out what you\u2019re really trying to optimize for is something like the click through rate or maybe an even more downstream metric, such as how many items did they end up buying or they end up watching.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There&#8217;s not always a one-to-one kind of relationship between the performance that you see offline and the performance that you see online. Things can look promising at the beginning, but don\u2019t end up deployed in production.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The other aspect I want to stress &#8211; when you have a model and put it into production, in many cases, there will be a distribution shift between the training data and the real data. The more time that passes, the more this shifts. If you don\u2019t do a good job at monitoring, improving the models, adapting them, to make sure they\u2019re as aligned as possible, you can see degradation of performance over time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Ambarish Jash, Google AI<br \/><\/strong>Piero brings up really good points. In big systems, your model is not the only one in the system. Having big offline gains doesn\u2019t always translate to online gains. One of the major reasons is you may not be passing any orthogonal signal to the system that you\u2019re training. So it always makes sense to make sense what the final goal is.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Typically that final goal is not just one final object, like driving CTR or a person buys. There are auxiliary goals as well. You can\u2019t create a model keeping these goals in mind at the same time, so you need to do A\/B testing, look at the data when it comes back. You have to be willing to fail the first few times.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Looking at the distributional shift in your data, having continuous retraining pipelines is easier said than done. You have to understand how many steps you want to fine tune, how to set the learning rate, how you accommodate new and sparse objects. There\u2019s a tone of systems work that must go on in the background to put something into production.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Victor Sanh, Hugging Face<\/strong><br \/>One rookie mistake I see &#8211; if you don\u2019t take into account production constraints from the very beginning, you can end up with an overcomplicated model that will never make it into production.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There\u2019s stories where you have this great model with really high accuracy, but it took 24 hours to run. So you\u2019d never take that into production. You have to understand those constraints at the beginning, or you\u2019ll never make it to the end.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Another point is being stuck in \u201cwishful thinking.\u201d It\u2019s when you look at the results and see what you want, not what they say. It can be super challenging at the beginning not to see the results for what they are. This is especially hard when you\u2019re on deadline.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Want to watch the full panel? It&#8217;s available <a href=\"https:\/\/info.comet.ml\/panel-addressing-ml-challenges\/\">on-demand here.<\/a><\/em><\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n<h2 class=\"wp-block-heading\"><em>Want to stay in the loop?\u00a0<a href=\"https:\/\/info.comet.ml\/newsletter-signup\/?utm_campaign=tensorboard-integration&amp;utm_source=blog&amp;utm_medium=CTA\">Subscribe to the Comet Newsletter<\/a>\u00a0for weekly insights and perspective on the latest ML news, projects, and more.<\/em><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>Although every machine learning project is different, there are common pitfalls and challenges that machine learning teams face when building and training models, and then taking them into production. Many of these challenges can be addressed when taken into consideration upfront, such as understanding the end goal, as well as the limitations that will be faced in your production environment.<\/p>\n","protected":false},"author":1,"featured_media":2264,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[10],"tags":[],"coauthors":[109],"class_list":["post-2263","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Industry Q&amp;A: Where Most ML Projects Fail - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Industry Q&amp;A: Where Most ML Projects Fail\" \/>\n<meta property=\"og:description\" content=\"Although every machine learning project is different, there are common pitfalls and challenges that machine learning teams face when building and training models, and then taking them into production. Many of these challenges can be addressed when taken into consideration upfront, such as understanding the end goal, as well as the limitations that will be faced in your production environment.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2020-10-20T21:19:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1153\" \/>\n\t<meta property=\"og:image:height\" content=\"639\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Ken Hoyle\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ken Hoyle\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Industry Q&A: Where Most ML Projects Fail - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/","og_locale":"en_US","og_type":"article","og_title":"Industry Q&A: Where Most ML Projects Fail","og_description":"Although every machine learning project is different, there are common pitfalls and challenges that machine learning teams face when building and training models, and then taking them into production. Many of these challenges can be addressed when taken into consideration upfront, such as understanding the end goal, as well as the limitations that will be faced in your production environment.","og_url":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2020-10-20T21:19:29+00:00","og_image":[{"width":1153,"height":639,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png","type":"image\/png"}],"author":"Ken Hoyle","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Ken Hoyle","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/"},"author":{"name":"engineering@atre.net","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b"},"headline":"Industry Q&#038;A: Where Most ML Projects Fail","datePublished":"2020-10-20T21:19:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/"},"wordCount":848,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png","articleSection":["Industry"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/","url":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/","name":"Industry Q&A: Where Most ML Projects Fail - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png","datePublished":"2020-10-20T21:19:29+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png","width":1153,"height":639,"caption":"Where do most machine projects fails"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/industry-qa-where-most-machine-learning-projects-fail\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Industry Q&#038;A: Where Most ML Projects Fail"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b","name":"engineering@atre.net","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/027c18177377edf459980f0cfb83706c","url":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","caption":"engineering@atre.net"},"sameAs":["https:\/\/live-cometml.pantheonsite.io"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/engineeringatre-net\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-16-at-9.55.08-AM.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=2263"}],"version-history":[{"count":0,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2263\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/2264"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=2263"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=2263"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=2263"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=2263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}