{"id":4127,"date":"2022-10-20T13:30:12","date_gmt":"2022-10-20T21:30:12","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4127"},"modified":"2025-04-24T17:17:04","modified_gmt":"2025-04-24T17:17:04","slug":"experiment-management-to-build-better-models","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/","title":{"rendered":"How Experiment Management Makes it Easier to Build Better Models Faster"},"content":{"rendered":"\n<div class=\"\">\n<p id=\"1388\" class=\"pw-subtitle-paragraph jv ix iy bm b jw jx jy jz ka kb kc kd ke kf kg kh ki kj kk kl km cn\">Sharing Best Practices Learned from the Best Machine Learning Teams in the World<\/p>\n<div class=\"ir is it iu iv\">\n<p id=\"3ba0\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Developing machine learning models can quickly become a messy, complicated process. As a hobbyist or learner working on a side passion project you often don\u2019t need to worry about tracking or recording your experiments, reproducing results, looking up previous model runs, or collaborating with others. You can get by perfectly fine with scratches of handwritten notes in your Moleskine or notes in the markdown cells of your Jupyter notebooks.<\/p>\n<blockquote class=\"ly lz ma\"><p id=\"afd1\" class=\"ld le mb bm b lf lg jz lh li lj kc lk mc lm ln lo md lq lr ls me lu lv lw lx ir ga\" data-selectable-paragraph=\"\">These practices, however, won\u2019t cut it in an enterprise environment.<\/p><\/blockquote>\n<p id=\"229c\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Whether you want to go pro, level up in your career, or start leading machine learning teams, you\u2019ll need to develop some good habits and understand some best practices.<\/p>\n<p id=\"4d34\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Developing machine learning models at scale for the enterprise is an&nbsp;<strong class=\"bm mf\">iterative, experimental, collaborative process&nbsp;<\/strong>that can become messy and hard to manage.<\/p>\n<p id=\"e4a7\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">As the first data scientist in an enterprise, I\u2019ve experienced first hand the pain of not having guidelines and best practices in place for my model development process. I\u2019ve had to go back to the drawing board after the model I deployed started degrading in production. I\u2019ve had to traverse strings of code I wrote, thumb through (physical) notebooks, ctrl-f my way through text files, and go cell surfing through spreadsheets searching for a clues and trying to figure out what me from six months ago was thinking when building that model.<\/p>\n<p id=\"228a\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">I don\u2019t want you to go through that same headache.<\/strong><\/p>\n<\/div>\n<div class=\"ir is it iu iv\">\n<blockquote class=\"mn\"><p id=\"2674\" class=\"mo mp iy bm mq mr ms mt mu mv mw lx cn\" data-selectable-paragraph=\"\">\u201cIt\u2019s said that a wise person learns from his mistakes. A wiser one learns from others\u2019 mistakes. But the wisest person of all learns from others\u2019s successes.\u201d \u2014 <strong>J<\/strong><strong class=\"ba\">ohn C. Maxwell<\/strong><\/p><\/blockquote>\n<\/div>\n<div class=\"ir is it iu iv\" style=\"text-align: center;\">\n<p id=\"a084\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">In this blog post, I\u2019ll share some lessons that will help you standardize your experimental process so you can quickly formulate, test, and evaluate hypotheses for your problem statement so that you can build better models, faster.<\/p>\n<p id=\"4671\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">A preview of what will be discussed in this post:<\/p>\n<ul class=\"\">\n<li id=\"5b0d\" class=\"mx my iy bm b lf lg li lj ll mz lp na lt nb lx nc nd ne nf ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">An overview of the machine learning model development process.<\/li>\n<li id=\"3b63\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx nc nd ne nf ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Managing your experimental runs like a professional.<\/li>\n<li id=\"8961\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx nc nd ne nf ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Iterating and experimenting with algorithms.<\/li>\n<li id=\"4905\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx nc nd ne nf ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Hyperparameter search and fine-tuning.<\/li>\n<\/ul>\n<h1 id=\"3ec8\" class=\"nl nm iy bm nn no np nq nr ns nt nu nv ke nw kf nx kh ny ki nz kk oa kl ob oc ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">The machine learning model development process<\/h1>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9\" alt=\"\" width=\"700\" height=\"467\"><\/figure><div class=\"gl gm od\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*yE5I6xtHWpIGyDw9 640w, https:\/\/miro.medium.com\/max\/720\/0*yE5I6xtHWpIGyDw9 720w, https:\/\/miro.medium.com\/max\/750\/0*yE5I6xtHWpIGyDw9 750w, https:\/\/miro.medium.com\/max\/786\/0*yE5I6xtHWpIGyDw9 786w, https:\/\/miro.medium.com\/max\/828\/0*yE5I6xtHWpIGyDw9 828w, https:\/\/miro.medium.com\/max\/1100\/0*yE5I6xtHWpIGyDw9 1100w, https:\/\/miro.medium.com\/max\/1400\/0*yE5I6xtHWpIGyDw9 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div><figcaption class=\"kz bl gn gl gm la lb bm b bn bo cn\" data-selectable-paragraph=\"\">Photo by&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/@ffstop?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Fotis Fotopoulos<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<p id=\"0128\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">If you\u2019ve been following along in this series, then you\u2019ve heard me say this at least 42 times: The machine learning lifecycle is iterative and continuous.<\/p>\n<p id=\"a3f9\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Implicit in this continuous process is a feedback loop that will help you improve. That\u2019s because results from later stages in the pipeline are informed by decisions in the earlier stages. But what does that feedback loop look like?<\/p>\n<p id=\"d913\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Whether it\u2019s&nbsp;<a class=\"au lc\" href=\"https:\/\/www.ibm.com\/docs\/en\/spss-modeler\/SaaS?topic=dm-crisp-help-overview\" target=\"_blank\" rel=\"noopener ugc nofollow\">CRISP-DM<\/a>,&nbsp;<a class=\"au lc\" href=\"https:\/\/towardsdatascience.com\/5-steps-of-a-data-science-project-lifecycle-26c50372b492\" target=\"_blank\" rel=\"noopener\">OSEMN<\/a>, or&nbsp;<a class=\"au lc\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/data-science-process\/overview\" target=\"_blank\" rel=\"noopener ugc nofollow\">Microsoft\u2019s TDSP<\/a>, all machine learning lifecycle frames will involve<strong class=\"bm mf\">&nbsp;six activities<\/strong>.<\/p>\n<p id=\"2097\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">First<\/strong>&nbsp;is a process of understanding the business problem or research question and identifying data that is relevant to making progress against that problem.<\/p>\n<p id=\"241a\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">Second<\/strong>&nbsp;is preparing data through some process of ingesting, integrating, and enriching it. This is usually some combination of data pipelines for ETL processes and feature engineering.<\/p>\n<p id=\"1dcf\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">Third<\/strong>&nbsp;is representing the data generating phenomena using statistical and machine learning algorithms by finding a model of best fit.<\/p>\n<p id=\"8330\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">Fourth<\/strong>&nbsp;is evaluating the model and testing it to make sure that it\u2019s able to generalize to unseen data points.<\/p>\n<p id=\"5134\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">Fifth<\/strong>&nbsp;is deploying the model as part of some application or larger system.<\/p>\n<p id=\"cbec\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">And&nbsp;<strong class=\"bm mf\">sixth<\/strong>&nbsp;is monitoring the performance of that model by measuring and assessing its effectiveness as well as monitoring the statistical properties of the data that is coming into the model.<\/p>\n<p id=\"8d0a\" class=\"mo mp iy bm mq mr ms mt mu mv mw lx cn\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Then you get to repeat the whole process.<\/p>\n<figure class=\"of og oh oi oj ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/1*yH3--5V0A51OD_vV_YxYJw.png\" alt=\"\" width=\"526\" height=\"605\"><\/figure><div class=\"gl gm oe\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*yH3--5V0A51OD_vV_YxYJw.png 640w, https:\/\/miro.medium.com\/max\/720\/1*yH3--5V0A51OD_vV_YxYJw.png 720w, https:\/\/miro.medium.com\/max\/750\/1*yH3--5V0A51OD_vV_YxYJw.png 750w, https:\/\/miro.medium.com\/max\/786\/1*yH3--5V0A51OD_vV_YxYJw.png 786w, https:\/\/miro.medium.com\/max\/828\/1*yH3--5V0A51OD_vV_YxYJw.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*yH3--5V0A51OD_vV_YxYJw.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*yH3--5V0A51OD_vV_YxYJw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"kz bl gn gl gm la lb bm b bn bo cn\" data-selectable-paragraph=\"\">Source: Author<\/figcaption>\n<\/figure>\n<blockquote class=\"ly lz ma\"><p id=\"d2ba\" class=\"ld le mb bm b lf lg jz lh li lj kc lk mc lm ln lo md lq lr ls me lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">\u201cBecause machine learning is such an empirical process, being able to go through this loop many times very quickly is key to improving performance.\u201d \u2014 Andrew Ng<\/p><\/blockquote>\n<p id=\"c951\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">I can\u2019t recall&nbsp;<a class=\"au lc\" href=\"https:\/\/heartbeat.comet.ml\/why-arent-we-talking-about-experiment-management-as-much-as-we-should-be-75c015bf57ec\" target=\"_blank\" rel=\"noopener ugc nofollow\">exactly where I heard this<\/a>, but\u2026<\/p>\n<p id=\"4196\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">If anything is true about the machine learning process, is that it abides by&nbsp;<a class=\"au lc\" href=\"https:\/\/static.googleusercontent.com\/media\/research.google.com\/en\/\/pubs\/archive\/43146.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">the CACE Principle<\/a>.<\/p>\n<p id=\"d877\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">The CACE Principle:&nbsp;<strong class=\"bm mf\">C<\/strong>hanging&nbsp;<strong class=\"bm mf\">A<\/strong>nything&nbsp;<strong class=\"bm mf\">C<\/strong>hanges&nbsp;<strong class=\"bm mf\">E<\/strong>verything. The phenomena by which change anywhere in the machine learning process \u2014 especially those changes furthest upstream \u2014 will have unanticipated impact on your experiment and results.<\/p>\n<p id=\"aecf\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Which brings us to our next topic, experiment management.<\/p>\n<h2 id=\"ed07\" class=\"ok nm iy bm nn ol om on nr oo op oq nv ll or os nx lp ot ou nz lt ov ow ob ox ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">How to manage your experimental runs like a professional<\/h2>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/0*6vsLazD81gqXFzN9\" alt=\"\" width=\"700\" height=\"440\"><\/figure><div class=\"gl gm oy\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*6vsLazD81gqXFzN9 640w, https:\/\/miro.medium.com\/max\/720\/0*6vsLazD81gqXFzN9 720w, https:\/\/miro.medium.com\/max\/750\/0*6vsLazD81gqXFzN9 750w, https:\/\/miro.medium.com\/max\/786\/0*6vsLazD81gqXFzN9 786w, https:\/\/miro.medium.com\/max\/828\/0*6vsLazD81gqXFzN9 828w, https:\/\/miro.medium.com\/max\/1100\/0*6vsLazD81gqXFzN9 1100w, https:\/\/miro.medium.com\/max\/1400\/0*6vsLazD81gqXFzN9 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"kz bl gn gl gm la lb bm b bn bo cn\" data-selectable-paragraph=\"\">Photo by&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/@jkoblitz?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Julia Koblitz<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption>\n<\/figure>\n<blockquote class=\"mn\"><p id=\"be3a\" class=\"mo mp iy bm mq mr oz pa pb pc pd lx cn\" style=\"text-align: left;\" data-selectable-paragraph=\"\">\u201cWhen you\u2019re running dozens, hundreds, or maybe even more experiments, it\u2019s easy to forget what experiments you have already run. Having a system for tracking your experiments can help you be more efficient in making the decisions on the data, or the model, or hyperparameters to systematically improve your algorithm\u2019s performance.\u201d<\/p><p id=\"d179\" class=\"mo mp iy bm mq mr ms mt mu mv mw lx cn\" style=\"text-align: left;\" data-selectable-paragraph=\"\">\u2014&nbsp;<a class=\"au lc\" href=\"https:\/\/www.coursera.org\/learn\/introduction-to-machine-learning-in-production\/lecture\/B9eMQ\/experiment-tracking\" target=\"_blank\" rel=\"noopener ugc nofollow\">Andrew Ng, on experiment tracking<\/a>.<\/p><\/blockquote>\n<p id=\"9ccc\" class=\"pw-post-body-paragraph ld le iy bm b lf pe jz lh li pf kc lk ll pg ln lo lp ph lr ls lt pi lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">A common saying about machine learning is that it\u2019s when code meets data. A common saying about code is that it\u2019s read more than it\u2019s written. But in machine learning, reading only the code won\u2019t give you the full picture.<\/p>\n<blockquote class=\"ly lz ma\"><p id=\"e8af\" class=\"ld le mb bm b lf lg jz lh li lj kc lk mc lm ln lo md lq lr ls me lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Isn\u2019t Git good enough for all that?<\/p><\/blockquote>\n<p id=\"f51d\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Yes, versioning the code is absolutely necessary.<\/p>\n<p id=\"90c8\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">But you\u2019d only have access to the information you care most about if you ran the code from start to finish.&nbsp;<strong class=\"bm mf\">That could take hours, days, or weeks.<\/strong><\/p>\n<p id=\"6fa4\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">When running machine learning experiments, you care about much more than just the code. It\u2019s everything beyond the code that\u2019s most interesting: hyperparameters, metrics, predictions, dependencies, system metrics, training artifacts, and more. All of this is what allows you to understand differences in model performance and iterate towards a better model.<\/p>\n<blockquote class=\"ly lz ma\"><p id=\"215f\" class=\"ld le mb bm b lf lg jz lh li lj kc lk mc lm ln lo md lq lr ls me lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">What should I be tracking for my machine learning experiments?<\/p><\/blockquote>\n<p id=\"3ed6\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Like anything in machine learning, the answer usually depends. I\u2019ll share some advice I learned from the<strong class=\"bm mf\">&nbsp;Godfather of Machine Learning himself, Andrew Ng<\/strong>. In his course,&nbsp;<a class=\"au lc\" href=\"https:\/\/www.coursera.org\/learn\/introduction-to-machine-learning-in-production\/lecture\/B9eMQ\/experiment-tracking\" target=\"_blank\" rel=\"noopener ugc nofollow\">Introduction to Machine Learning in Production<\/a>, Andrew recommends tracking&nbsp;<em class=\"mb\">four keys pieces of information.<\/em><\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/0*42Z-S5MnTaSkZ1o6\" alt=\"\" width=\"438\" height=\"657\"><\/figure><div class=\"gl gm pj\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*42Z-S5MnTaSkZ1o6 640w, https:\/\/miro.medium.com\/max\/720\/0*42Z-S5MnTaSkZ1o6 720w, https:\/\/miro.medium.com\/max\/750\/0*42Z-S5MnTaSkZ1o6 750w, https:\/\/miro.medium.com\/max\/786\/0*42Z-S5MnTaSkZ1o6 786w, https:\/\/miro.medium.com\/max\/828\/0*42Z-S5MnTaSkZ1o6 828w, https:\/\/miro.medium.com\/max\/1100\/0*42Z-S5MnTaSkZ1o6 1100w, https:\/\/miro.medium.com\/max\/1400\/0*42Z-S5MnTaSkZ1o6 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"kz bl gn gl gm la lb bm b bn bo cn\" data-selectable-paragraph=\"\">Photo by&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/@sincerelymedia?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sincerely Media<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption>\n<\/figure>\n<p id=\"7e8d\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><strong class=\"bm mf\">First<\/strong>, keep track of the algorithm you\u2019re using along with the code. Having this information handy will make it easier to replicate the experiments you ran in the past (and whose details you may have forgotten about).&nbsp;<strong class=\"bm mf\">Second<\/strong>, keep track of the artifacts you used during training. Artifacts include: training data, testing data, other models (ex. which version of Word2vec did you use? Did you use a model to generate a feature for your larger model?).&nbsp;<strong class=\"bm mf\">Third<\/strong>, track hyperparameters, random seeds, training duration, number of epochs, activation functions, etc.&nbsp;<strong class=\"bm mf\">Fourth<\/strong>, save the results from your experiments. This includes metrics like MAE, MSE, F1, Log Loss, etc. That way you have visibility into how each experiment performed.<\/p>\n<p id=\"1154\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" style=\"text-align: left;\" data-selectable-paragraph=\"\">If you have all of this available along with your code, you don\u2019t need to run the code to find out what happened.&nbsp;<strong class=\"bm mf\">It\u2019s all right there for you.<\/strong><\/p>\n<\/div>\n<div class=\"ir is it iu iv\" style=\"text-align: left;\">\n<p id=\"21aa\" class=\"mo mp iy bm mq mr ms mt mu mv mw lx cn\" data-selectable-paragraph=\"\">Want to see these concepts in action?&nbsp;<a class=\"au lc\" href=\"https:\/\/www.youtube.com\/watch?v=VNH-Lv8Igtc&amp;t=530s\" target=\"_blank\" rel=\"noopener ugc nofollow\">Check out our Office Hours Working Sessions on YouTube.<\/a><\/p>\n<\/div>\n<div class=\"ir is it iu iv\" style=\"text-align: left;\">\n<blockquote class=\"ly lz ma\"><p id=\"15d6\" class=\"ld le mb bm b lf lg jz lh li lj kc lk mc lm ln lo md lq lr ls me lu lv lw lx ir ga\" data-selectable-paragraph=\"\">How should I be tracking all of this?<\/p><\/blockquote>\n<p id=\"e79e\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You might start off manually writing everything on post-it notes, in your notebooks, to a text file, or logging all console output to a file.<\/p>\n<p id=\"dc50\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">That\u2019s obviously not an efficient practice and doesn\u2019t lend itself well to scalability and reproducibility. If you\u2019re a little more disciplined, you might use a spreadsheet to track everything, like the one below (which honestly, isn\u2019t that impressive or sophisticated). Keeping track of and managing all this extraneous metadata might seem too tedious, it might feel like it creates more overhead and cognitive overload.&nbsp;<strong class=\"bm mf\">You probably feel like you don\u2019t need many of these bits and pieces of information\u2026until you do.<\/strong><\/p>\n<p id=\"722f\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Because the time will come when you go back to that experiment after a week, or a month, or a year, or you want to bring on collaborators, or share your work with somebody else. And when that happens, how is that person (future you, or present day collaborator) supposed figure out the end-to-end process of how the experiment was run?<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/0*qnte0-W76lsmiy82\" alt=\"\" width=\"700\" height=\"216\"><\/figure><div class=\"gl gm pk\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*qnte0-W76lsmiy82 640w, https:\/\/miro.medium.com\/max\/720\/0*qnte0-W76lsmiy82 720w, https:\/\/miro.medium.com\/max\/750\/0*qnte0-W76lsmiy82 750w, https:\/\/miro.medium.com\/max\/786\/0*qnte0-W76lsmiy82 786w, https:\/\/miro.medium.com\/max\/828\/0*qnte0-W76lsmiy82 828w, https:\/\/miro.medium.com\/max\/1100\/0*qnte0-W76lsmiy82 1100w, https:\/\/miro.medium.com\/max\/1400\/0*qnte0-W76lsmiy82 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<div style=\"text-align: center;\">Source: Author<\/div>\n<div><\/div>\n<\/div>\n<\/figure>\n<\/div>\n<\/div>\n\n\n\n<div>There\u2019s a lot of tools out there that can help with this. I recommend <a class=\"au lc\" href=\"https:\/\/www.comet.com\/site\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a>. I\u2019m obviously a bit biased, but if I didn\u2019t think the product was awesome I wouldn\u2019t be working here. It\u2019s free to try, so why not give it a shot?<\/div>\n\n\n\n<div><\/div>\n\n\n\n<div class=\"\" style=\"text-align: left;\">\n<p id=\"4301\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Checkout the&nbsp;<a class=\"au lc\" href=\"https:\/\/www.comet.com\/docs\/python-sdk\/getting-started\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Python quick start guide&nbsp;<\/a>or the&nbsp;<a class=\"au lc\" href=\"https:\/\/www.comet.com\/docs\/r-sdk\/getting-started\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">R quick start guide,<\/a>&nbsp;both will get you set up fast.<\/p>\n<p id=\"e40f\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Once you\u2019ve got tooling to help track all of your experimental runs, you can start iterating and experimenting with algorithms.<\/p>\n<h2 id=\"48b5\" class=\"ok nm iy bm nn ol om on nr oo op oq nv ll or os nx lp ot ou nz lt ov ow ob ox ga\" data-selectable-paragraph=\"\">Iterating and experimenting with algorithms<\/h2>\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/0*9OEEgxFpm5rllMOv\" alt=\"\" width=\"700\" height=\"467\"><\/figure><div class=\"gl gm pl\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*9OEEgxFpm5rllMOv 640w, https:\/\/miro.medium.com\/max\/720\/0*9OEEgxFpm5rllMOv 720w, https:\/\/miro.medium.com\/max\/750\/0*9OEEgxFpm5rllMOv 750w, https:\/\/miro.medium.com\/max\/786\/0*9OEEgxFpm5rllMOv 786w, https:\/\/miro.medium.com\/max\/828\/0*9OEEgxFpm5rllMOv 828w, https:\/\/miro.medium.com\/max\/1100\/0*9OEEgxFpm5rllMOv 1100w, https:\/\/miro.medium.com\/max\/1400\/0*9OEEgxFpm5rllMOv 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<div style=\"text-align: center;\">\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<figcaption class=\"kz bl gn gl gm la lb bm b bn bo cn\" data-selectable-paragraph=\"\">Photo by&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/@markusspiske?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Markus Spiske<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<p>We talked about the importance of baseline models in <a class=\"au lc\" href=\"https:\/\/heartbeat.comet.ml\/how-to-avoid-building-bad-machine-learning-models-by-validating-your-data-dc008c7a711a\" target=\"_blank\" rel=\"noopener ugc nofollow\">a previous article<\/a>.<\/p>\n<p id=\"60df\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">As a quick recap: you establish a baseline by trying a simple model for your task (it could be simple heuristics, a dummy model, linear or logistic regression) so you can get a sense of whether the problem you\u2019re working on is tractable or not. Once your baseline results are established, it\u2019s time to conduct an initial set of experiments with more complex models.<\/p>\n<p id=\"c8f1\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">How complex you get is going to depend on the problem you\u2019re trying to solve, but you might want to try boosted trees, Gaussian Processes, Neural Nets, or any algorithm you think would be applicable.<em class=\"mb\">&nbsp;Whichever ones you choose, start with off the shelf configurations for those algorithms<\/em>. Don\u2019t worry about tuning hyperparameters or applying regularization at this stage.<\/p>\n<p id=\"d30b\" class=\"mo mp iy bm mq mr ms mt mu mv mw lx cn\" data-selectable-paragraph=\"\">It may sound counterintuitive, but see if you can build a model that can overfit your training data.<\/p>\n<p id=\"f708\" class=\"pw-post-body-paragraph ld le iy bm b lf pe jz lh li pf kc lk ll pg ln lo lp ph lr ls lt pi lv lw lx ir ga\" data-selectable-paragraph=\"\">For example, imagine you sampled 1000 examples from your dataset (which is 10% of the total data). First, run your feature engineering pipeline on those samples and try to overfit your model on them. Then, iterate over the different combinations of features and try to identify which are most impactful for driving model performance. Finally, compare training metrics across model types and features to verify which types of models have the capacity to learn from the data at all.<\/p>\n<p id=\"a851\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You\u2019re doing this to&nbsp;<em class=\"mb\">verify that your training and evaluation setup is working as expected<\/em>, specifically:<\/p>\n<ol class=\"\">\n<li id=\"8884\" class=\"mx my iy bm b lf lg li lj ll mz lp na lt nb lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Is the data being loaded in the correct way for the model to consume?<\/li>\n<li id=\"16ae\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Are all the relevant metrics for this modeling task being logged? Do new ones have to be defined?<\/li>\n<li id=\"e0a8\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Are the splits in the datasets valid? Is there any leakage between the training and validation sets?<\/li>\n<li id=\"62c5\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Can the results be reproduced when rerunning the same experiment? Are there any discrepancies caused by incorrectly setting the random seed somewhere?<\/li>\n<li id=\"1b83\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Do the label and prediction match up perfectly? Once you\u2019re confident that the training and evaluation framework is trustworthy, you can start slowly ramping up the complexity of the model.<\/li>\n<\/ol>\n<p id=\"d9ac\" class=\"mo mp iy bm mq mr oz pa pb pc pd lx cn\" data-selectable-paragraph=\"\">Why does all this matter?<\/p>\n<p id=\"17d9\" class=\"pw-post-body-paragraph ld le iy bm b lf pe jz lh li pf kc lk ll pg ln lo lp ph lr ls lt pi lv lw lx ir ga\" data-selectable-paragraph=\"\">Let\u2019s say you\u2019re working with a Neural Network.<\/p>\n<p id=\"9b6e\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You may want to try to train on a single datapoint, or a single batch. See if you\u2019re able to get an error as close to 0 as possible, then check if the prediction and labels align perfectly. If this isn\u2019t the case,&nbsp;<strong class=\"bm mf\">that indicates something\u2019s wrong with your model architecture<\/strong>. You would debug this by logging the weights, gradients, and activations in the model and inspecting whether or not these are changing over the training process.<\/p>\n<p id=\"7ec8\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Once you\u2019re able to find a set of models that are able overfit the training data, you can try regularization approaches to improve the model\u2019s ability to generalize.<\/p>\n<p id=\"1786\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">For your regularization experiments,&nbsp;<em class=\"mb\">investigate the effects of scaling the amount of data on your model.<\/em>&nbsp;Track how the training and validation metrics for each model type change as more samples are added to the training set. Some examples of regularization include:<\/p>\n<ol class=\"\">\n<li id=\"f0a7\" class=\"mx my iy bm b lf lg li lj ll mz lp na lt nb lx pm nd ne nf ga\" data-selectable-paragraph=\"\">L1 and L2 regularization<\/li>\n<li id=\"7729\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Early Stopping<\/li>\n<li id=\"be65\" class=\"mx my iy bm b lf ng li nh ll ni lp nj lt nk lx pm nd ne nf ga\" data-selectable-paragraph=\"\">Dropout<\/li>\n<\/ol>\n<p id=\"0e4d\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">After this initial set of experiments you should have enough information to identify the most promising algorithms and feature combinations. These initial set of algorithms that will now move onto further development. At this stage, you should have an idea of the types of models that are able to fit your data, the features that work best for each model type, and the effects of scaling the data on each candidate model type.<\/p>\n<p id=\"6116\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You\u2019re now ready to start tuning some hyperparameters.<\/p>\n<h2 id=\"ac4e\" class=\"ok nm iy bm nn ol om on nr oo op oq nv ll or os nx lp ot ou nz lt ov ow ob ox ga\" data-selectable-paragraph=\"\">Hyperparameter search and fine-tuning<\/h2>\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/1050\/0*TrYnvd8VLwk1tgQj\" alt=\"\" width=\"700\" height=\"467\"><\/figure><div class=\"gl gm pn\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*TrYnvd8VLwk1tgQj 640w, https:\/\/miro.medium.com\/max\/720\/0*TrYnvd8VLwk1tgQj 720w, https:\/\/miro.medium.com\/max\/750\/0*TrYnvd8VLwk1tgQj 750w, https:\/\/miro.medium.com\/max\/786\/0*TrYnvd8VLwk1tgQj 786w, https:\/\/miro.medium.com\/max\/828\/0*TrYnvd8VLwk1tgQj 828w, https:\/\/miro.medium.com\/max\/1100\/0*TrYnvd8VLwk1tgQj 1100w, https:\/\/miro.medium.com\/max\/1400\/0*TrYnvd8VLwk1tgQj 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<div style=\"text-align: center;\">Photo by&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/@dlerman6?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Daniel Lerman<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/div>\n<div><\/div>\n<\/div>\n<p id=\"d99e\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You don\u2019t want to spend unnecessary time tuning hyperparameters.<\/p>\n<p id=\"2ee8\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">A good way to start is by using the sampled dataset to quickly iterate over different hyperparameters configurations. Your results here are highly dependent on whether you can get a representative sample of your dataset. If the data is complex, this can be much harder.<\/p>\n<p id=\"b015\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">One workaround is to investigate the effects adding more data has on performance of the top N number of hyperparameters (with just the sampled data). You can use the heuristic of compute resources available as a way to determine what the threshold for N should be. To avoid overfitting the sampled dataset, build multiple validation sets by sampling with replacement from the full validation set.<\/p>\n<p id=\"b36b\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">The average of the results from each set is an indicator of the performance of the hyperparameters on the entire validation set.<\/p>\n<p id=\"cdc3\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You want to be methodical when searching your hyperparameter space.<\/p>\n<p id=\"6013\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Start simple, using a search algorithm like Random Search. Random Search serves as a solid baseline optimization technique. Depending on the problem you\u2019re working on you may opt to use&nbsp;<a class=\"au lc\" href=\"https:\/\/www.comet.com\/docs\/python-sdk\/introduction-optimizer\/#bayes-algorithm\" target=\"_blank\" rel=\"noopener ugc nofollow\">Bayesian Optimization methods&nbsp;<\/a>to really fine tune and narrow down your search space. The most efficient search strategy is an iterative approach.<\/p>\n<p id=\"af95\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Start with a wide range of values for each hyperparameter, sweeping across a wide search space. You can use discrete values for the parameter options \u2014 even if your hyperparameter is a continuous variable (i.e. learning rate, momentum, etc.). It\u2019s much more efficient to narrow down the range of values before trying to search a continuous parameter space.<\/p>\n<p id=\"1df2\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Once you have a set of hyperparameters that work well with your data, you can fine tune your approach by trying ensembling models together using the top performing models, letting your model train for an extended period of time, or further optimize your feature engineering pipeline.<\/p>\n<p id=\"e5a9\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">If you want to see hyperparameter tuning in action using Comet, check out&nbsp;<a class=\"au lc\" href=\"https:\/\/www.comet.com\/team-comet-ml\/parameter-optimizations\/reports\/advanced-ml-parameter-optimization\" target=\"_blank\" rel=\"noopener ugc nofollow\">this post<\/a>:<\/p>\n<div class=\"po pp gt gv pq pr\">\n<div class=\"ps o fr\">\n<figure><a href=\"https:\/\/www.comet.com\/team-comet-ml\/parameter-optimizations\/reports\/advanced-ml-parameter-optimization\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4128\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/comet-1.png\" alt=\"\" width=\"727\" height=\"182\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/comet-1.png 1043w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/comet-1-300x75.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/comet-1-1024x256.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/comet-1-768x192.png 768w\" sizes=\"auto, (max-width: 727px) 100vw, 727px\" \/><\/a><\/figure><div class=\"pt o da dx en pu\">\n<h2 class=\"bm iz dm bo fs pv fu fv pw fx fz ix ga\"><\/h2>\n<\/div>\n<\/div>\n<\/div>\n<h2 id=\"063c\" class=\"ok nm iy bm nn ol om on nr oo op oq nv ll or os nx lp ot ou nz lt ov ow ob ox ga\" data-selectable-paragraph=\"\">Conclusion<\/h2>\n<p id=\"2237\" class=\"pw-post-body-paragraph ld le iy bm b lf qf jz lh li qg kc lk ll qh ln lo lp qi lr ls lt qj lv lw lx ir ga\" data-selectable-paragraph=\"\">Machine learning is an iterative process with a lot of moving parts.<\/p>\n<p id=\"2eb3\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Manually tracking the entire process is not only unnecessary \u2014 it\u2019s an unreasonably error prone burden.&nbsp;<a class=\"au lc\" href=\"https:\/\/research.google.com\/pubs\/archive\/43146.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">People talk a lot about technical debt&nbsp;<\/a>in machine learning, and a lot of that high interest comes from tracking. Everything from metrics, dataset distributions, hardware details, etc. It\u2019s in our best interest to automate this process as much as possible.<\/p>\n<p id=\"851c\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">The machine learning community can benefit from best practices used in traditional software engineering. Leveraging automated tools such as Github, and CI. Making good use of version control pull requests, CI tools and containers to run your experiments are all great lessons we can learn from software engineering.<\/p>\n<p id=\"486b\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">But that\u2019s not enough for machine learning.<\/p>\n<p id=\"7485\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Machine learning is not just when code meets data. It\u2019s when code meets data plus hyperparameters, model architectures, random seeds, compute environments, evaluation metrics, and a whole host of important considerations. But still we can adopt best practices designed to facilitate collaboration and automation of the process.<\/p>\n<p id=\"5ba1\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">And my hope with this article is that we\u2019ve done just that: shared best practices that we\u2019ve learned from some of the best machine learning teams in the world.<\/p>\n<div class=\"o dx mg mh id mi\" style=\"text-align: left;\" role=\"separator\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Sharing Best Practices Learned from the Best Machine Learning Teams in the World Developing machine learning models can quickly become a messy, complicated process. As a hobbyist or learner working on a side passion project you often don\u2019t need to worry about tracking or recording your experiments, reproducing results, looking up previous model runs, or [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[135],"class_list":["post-4127","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How Experiment Management Makes it Easier to Build Better Models<\/title>\n<meta name=\"description\" content=\"Developing machine learning models can quickly become a messy, complicated process. Use experiment management to make it easier.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Experiment Management Makes it Easier to Build Better Models Faster\" \/>\n<meta property=\"og:description\" content=\"Developing machine learning models can quickly become a messy, complicated process. Use experiment management to make it easier.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-10-20T21:30:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:17:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How Experiment Management Makes it Easier to Build Better Models","description":"Developing machine learning models can quickly become a messy, complicated process. Use experiment management to make it easier.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/","og_locale":"en_US","og_type":"article","og_title":"How Experiment Management Makes it Easier to Build Better Models Faster","og_description":"Developing machine learning models can quickly become a messy, complicated process. Use experiment management to make it easier.","og_url":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-10-20T21:30:12+00:00","article_modified_time":"2025-04-24T17:17:04+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9","type":"","width":"","height":""}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"How Experiment Management Makes it Easier to Build Better Models Faster","datePublished":"2022-10-20T21:30:12+00:00","dateModified":"2025-04-24T17:17:04+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/"},"wordCount":2637,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/","url":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/","name":"How Experiment Management Makes it Easier to Build Better Models","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9","datePublished":"2022-10-20T21:30:12+00:00","dateModified":"2025-04-24T17:17:04+00:00","description":"Developing machine learning models can quickly become a messy, complicated process. Use experiment management to make it easier.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9","contentUrl":"https:\/\/miro.medium.com\/max\/1050\/0*yE5I6xtHWpIGyDw9"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/experiment-management-to-build-better-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"How Experiment Management Makes it Easier to Build Better Models Faster"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4127"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4127\/revisions"}],"predecessor-version":[{"id":15672,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4127\/revisions\/15672"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4127"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}