{"id":2260,"date":"2020-10-13T16:29:58","date_gmt":"2020-10-14T00:29:58","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/blog\/how-to-start-the-machine-learning-research-process\/"},"modified":"2020-10-13T16:29:58","modified_gmt":"2020-10-14T00:29:58","slug":"how-to-start-the-machine-learning-research-process","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/","title":{"rendered":"Industry Q&#038;A: Starting the ML Process"},"content":{"rendered":"\n<p><em>Comet recently hosted the online panel, <a href=\"https:\/\/info.comet.ml\/panel-addressing-ml-challenges\/\">\u201cHow do top AI researchers from Google, Stanford and Hugging Face approach new ML problems?\u201d<\/a> This post is our first in a series where we recap the questions, answers, and approaches that top AI teams in the world are taking to critical machine learning challenges. <\/em><\/p>\n\n\n\n<p><em>We would like to thank <a href=\"https:\/\/twitter.com\/ambarish_jash?lang=en\">Ambarish Jash<\/a>, <a href=\"https:\/\/ai.google\/\">Google<\/a><\/em>; <em><a href=\"https:\/\/twitter.com\/w4nderlus7?lang=en\">Piero Molino<\/a>, <a href=\"https:\/\/ai.stanford.edu\/\">Stanford<\/a><\/em> + <em><a href=\"https:\/\/twitter.com\/ludwig_ai\">Ludwig<\/a><\/em>; <em>and <a href=\"https:\/\/twitter.com\/sanhestpasmoi?lang=en\">Victor Sanh<\/a>, <a href=\"https:\/\/huggingface.co\/\">Hugging Face<\/a><\/em>;<em> for their participation. <\/em><\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">https:\/\/vimeo.com\/466584783<\/div>\n<\/figure>\n\n\n\n<p>One of the hardest parts of machine learning is simply getting started. Do you have the necessary data? Do you have the systems in place to manage your model? If you need to take it into production, do you have a good understanding of what the production environment looks like? All of these are considerations that need to be made in order to ensure your work is successful &#8212; and if the problem you\u2019re trying to solve is worth pursuing.<\/p>\n\n\n\n<p><strong>Gideon Mendels, Comet<\/strong><br \/>There are a number of challenges to even approaching a machine learning challenge. There are so many moving parts. For those of us in the industry, it\u2019s very different from what you might see in something like a Kaggle competition, where you have a clean dataset and the metrics are figured out for you.<\/p>\n\n\n\n<p>So how do you start the research process? What do you do when you have a new problem?<\/p>\n\n\n\n<p><strong>Ambarish Jash, Google AI<\/strong><br \/>Two parts, one is the research problem and coming up with the problem definition. The other is the challenge of putting it in production. Production puts a significant amount of constraint on the research you can do. One approach is to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define the problem<\/li>\n<li>Define the system you\u2019re going to build<\/li>\n<\/ul>\n\n\n\n<p>If you need to go into production, I would strongly recommend you about building the system first and maybe start with simpler models. In the long run, it makes things like debugging and maintenance simpler.<\/p>\n\n\n\n<p>Keep things simple, build out pipelines. Once these are built, you can start to rapidly iterate on the model. You need a lot of data, and most of the time, the loss isn\u2019t exactly what you care about. Having a strong evaluation framework is important too, because as you start to add complexity to your model, you will need to figure out if it\u2019s making sense with your final task.<\/p>\n\n\n\n<p><strong>Piero Moilino, Stanford &amp; Ludwig<br \/><\/strong>This depends on the project, based on if it\u2019s a theory applied project or a research project. For theory applied, I\u2019m often the only one with access to data and there usual isn\u2019t much historical progress on the data. While on a research problem, there may be papers about it and I can start from there.<\/p>\n\n\n\n<p>If the project is applied, the first thing I do is try to understand the data. That\u2019s the number one thing. Is there signal for the problem I want to solve? In many cases, a machine learning project won\u2019t work simply because there isn\u2019t enough signal for it in the data.<\/p>\n\n\n\n<p>After looking at the data, I personally use tools that I build for myself. This is just because it\u2019s easier for me to compare different models, have a standard pipeline, and then I can reuse it. Usually I train a simple model, see what\u2019s there, look at the predictions, do some visualizations, understand the predictions and learning curves.<\/p>\n\n\n\n<p>Once I feel I have a global understanding of the problem, the data, and an initial simple solution, then I double down on complex models or more sophisticated solutions. But something simple first, then scale up, is really good advice.<\/p>\n\n\n\n<p>I believe that machine learning projects are much closer to research projects than software projects. For software, usually you define your constraints and implement, and you know it will work ahead of time. That\u2019s not the case for machine learning.<\/p>\n\n\n\n<p>In machine learning, you don\u2019t know if the problem will work to begin with. By starting simple, and understanding if you have signal, you can get an idea if you have what you need to solve the problem. Otherwise you can spend a lot of time and end up with a model that doesn\u2019t work. Fail fast and try to figure it out early if you can solve the problem.<\/p>\n\n\n\n<p><strong>Victor Sanh, Hugging Face<\/strong><br \/>My approach is to \u201creally lose.\u201d We have a spreadsheet with a lot of ideas that come along, and we take the ones that excite us the most.<\/p>\n\n\n\n<p>But I agree with Piero and Ambarish &#8211; you take the problems, then you want to start fast and iterate fast at the beginning. You want to understand the data and learning processes, so you can decide \u201cis it worth it to pursue this problem for a few weeks?\u201d<\/p>\n\n\n\n<p>The first two weeks are decisive, because that\u2019s when you get a sense of your data and understand if there\u2019s actually improvement that you can do there.<\/p>\n\n\n\n<p><em>Want to watch the full panel? It&#8217;s available <a href=\"https:\/\/info.comet.ml\/panel-addressing-ml-challenges\/\">on-demand here.<\/a><\/em><\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n<h2 class=\"wp-block-heading\"><em>Want to stay in the loop?\u00a0<a href=\"https:\/\/info.comet.ml\/newsletter-signup\/?utm_campaign=tensorboard-integration&amp;utm_source=blog&amp;utm_medium=CTA\">Subscribe to the Comet Newsletter<\/a>\u00a0for weekly insights and perspective on the latest ML news, projects, and more.<\/em><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>One of the hardest parts of machine learning is simply getting started. See how top AI researchers are address this problem.<\/p>\n","protected":false},"author":1,"featured_media":2261,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[10],"tags":[],"coauthors":[109],"class_list":["post-2260","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Industry Q&amp;A: Starting the ML Process - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Industry Q&amp;A: Starting the ML Process\" \/>\n<meta property=\"og:description\" content=\"One of the hardest parts of machine learning is simply getting started. See how top AI researchers are address this problem.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2020-10-14T00:29:58+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-07-at-6.50.17-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1646\" \/>\n\t<meta property=\"og:image:height\" content=\"888\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Ken Hoyle\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ken Hoyle\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Industry Q&A: Starting the ML Process - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/","og_locale":"en_US","og_type":"article","og_title":"Industry Q&A: Starting the ML Process","og_description":"One of the hardest parts of machine learning is simply getting started. See how top AI researchers are address this problem.","og_url":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2020-10-14T00:29:58+00:00","og_image":[{"width":1646,"height":888,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-07-at-6.50.17-PM.png","type":"image\/png"}],"author":"Ken Hoyle","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Ken Hoyle","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/"},"author":{"name":"engineering@atre.net","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b"},"headline":"Industry Q&#038;A: Starting the ML Process","datePublished":"2020-10-14T00:29:58+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/"},"wordCount":877,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-07-at-6.50.17-PM.png","articleSection":["Industry"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/","url":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/","name":"Industry Q&A: Starting the ML Process - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-07-at-6.50.17-PM.png","datePublished":"2020-10-14T00:29:58+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-07-at-6.50.17-PM.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-10-07-at-6.50.17-PM.png","width":1646,"height":888,"caption":"how Google, standford and hugging face approach new ML problems"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-start-the-machine-learning-research-process\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Industry Q&#038;A: Starting the ML Process"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b","name":"engineering@atre.net","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/027c18177377edf459980f0cfb83706c","url":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","caption":"engineering@atre.net"},"sameAs":["https:\/\/live-cometml.pantheonsite.io"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/engineeringatre-net\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2260","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=2260"}],"version-history":[{"count":0,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2260\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/2261"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=2260"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=2260"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=2260"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=2260"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}