{"id":5919,"date":"2023-06-14T08:03:26","date_gmt":"2023-06-14T16:03:26","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=5919"},"modified":"2025-04-24T17:15:29","modified_gmt":"2025-04-24T17:15:29","slug":"5-regression-loss-functions-all-machine-learners-should-know","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/","title":{"rendered":"5 Regression Loss Functions All Machine Learners Should Know"},"content":{"rendered":"\n<link rel=\"\u201ccanonical\u201d\" href=\"\u201chttps:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\u201d\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"fc76\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">All the algorithms in machine learning rely on minimizing or maximizing a function, which we call \u201cobjective function\u201d. The group of functions that are minimized are called \u201closs functions\u201d. A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome. A most commonly used method of finding the minimum point of function is \u201cgradient descent\u201d. Think of loss function like undulating mountain and gradient descent is like sliding down the mountain to reach the bottommost point.<\/p>\n<p id=\"79f9\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">There is not a single loss function that works for all kind of data. It depends on a number of factors including the presence of outliers, choice of machine learning algorithm, time efficiency of gradient descent, ease of finding the derivatives and confidence of predictions. The purpose of this blog series is to learn about different losses and how each of them can help data scientists.<\/p>\n<p id=\"8021\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Loss functions can be broadly categorized into 2 types: <strong class=\"be na\">Classification and Regression Loss<\/strong>. In this post, I\u2019m focussing on regression loss. In future posts I cover loss functions in other categories. Please let me know in comments if I miss something. Also, all the codes and plots shown in this blog can be found in <a class=\"af nb\" href=\"https:\/\/nbviewer.jupyter.org\/github\/groverpr\/Machine-Learning\/blob\/master\/notebooks\/05_Loss_Functions.ipynb\" target=\"_blank\" rel=\"noopener ugc nofollow\">this notebook.<\/a><\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png\" alt=\"\" width=\"486\" height=\"669\"><\/figure><div class=\"nc nd ne\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:972\/format:webp\/1*3MsFzl7zRZE3TihIC9JmaQ.png 972w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 486px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*3MsFzl7zRZE3TihIC9JmaQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*3MsFzl7zRZE3TihIC9JmaQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*3MsFzl7zRZE3TihIC9JmaQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*3MsFzl7zRZE3TihIC9JmaQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*3MsFzl7zRZE3TihIC9JmaQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*3MsFzl7zRZE3TihIC9JmaQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:972\/1*3MsFzl7zRZE3TihIC9JmaQ.png 972w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 486px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<blockquote class=\"nn\"><p id=\"c176\" class=\"no np fo be nq nr ns nt nu nv nw mz dv\" data-selectable-paragraph=\"\"><em class=\"nx\">Regression functions predict a quantity, and classification functions predict a label.<\/em><\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"ab ca ny nz oa ob\" role=\"separator\"><\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"d104\" class=\"og oh fo be oi oj ok go ol om on gr oo op oq or os ot ou ov ow ox oy oz pa pb bj\" data-selectable-paragraph=\"\">Regression loss<\/h1>\n<h2 id=\"7ff8\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\">1. <strong class=\"al\">Mean Square Error, Quadratic loss, L2 Loss<\/strong><\/h2>\n<p id=\"1d10\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\"><a class=\"af nb\" href=\"https:\/\/medium.freecodecamp.org\/machine-learning-mean-squared-error-regression-line-c7dde9a26b93\" target=\"_blank\" rel=\"noopener ugc nofollow\">Mean Square Error (MSE)<\/a> is the most commonly used regression loss function. MSE is the sum of squared distances between our target variable and predicted values.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:255\/1*mlXnpXGdhMefPybSQtRmDA.png\" alt=\"\" width=\"255\" height=\"101\"><\/figure><div class=\"nc nd py\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:510\/format:webp\/1*mlXnpXGdhMefPybSQtRmDA.png 510w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 255px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*mlXnpXGdhMefPybSQtRmDA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*mlXnpXGdhMefPybSQtRmDA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*mlXnpXGdhMefPybSQtRmDA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*mlXnpXGdhMefPybSQtRmDA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*mlXnpXGdhMefPybSQtRmDA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*mlXnpXGdhMefPybSQtRmDA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:510\/1*mlXnpXGdhMefPybSQtRmDA.png 510w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 255px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"5dca\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Below is a plot of an MSE function where the true target value is 100, and the predicted values range between -10,000 to 10,000. The MSE loss (Y-axis) reaches its minimum value at prediction (X-axis) = 100. The range is 0 to \u221e.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:576\/1*EqTaoCB1NmJnsRYEezSACA.png\" alt=\"\" width=\"576\" height=\"360\"><\/figure><div class=\"nc nd pz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1152\/format:webp\/1*EqTaoCB1NmJnsRYEezSACA.png 1152w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 576px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*EqTaoCB1NmJnsRYEezSACA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*EqTaoCB1NmJnsRYEezSACA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*EqTaoCB1NmJnsRYEezSACA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*EqTaoCB1NmJnsRYEezSACA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*EqTaoCB1NmJnsRYEezSACA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*EqTaoCB1NmJnsRYEezSACA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1152\/1*EqTaoCB1NmJnsRYEezSACA.png 1152w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 576px\" data-testid=\"og\"><\/picture><\/div><figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Plot of MSE Loss (Y-axis) vs. Predictions (X-axis)<\/figcaption><\/figure>\n<h2 id=\"637a\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\">2. <strong class=\"al\">Mean Absolute Error, L1 Loss<\/strong><\/h2>\n<p id=\"c406\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\"><a class=\"af nb\" href=\"https:\/\/medium.com\/@ewuramaminka\/mean-absolute-error-mae-sample-calculation-6eed6743838a\" rel=\"noopener\">Mean Absolute Error<\/a> (MAE) is another loss function used for regression models. MAE is the sum of absolute differences between our target and predicted variables. So it measures the average magnitude of errors in a set of predictions, without considering their directions. (If we consider directions also, that would be called Mean Bias Error (MBE), which is a sum of residuals\/errors). The range is also 0 to \u221e.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:257\/1*xjarhfIDtRcaNhp7ZEyEdg.png\" alt=\"\" width=\"257\" height=\"95\"><\/figure><div class=\"nc nd qf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:514\/format:webp\/1*xjarhfIDtRcaNhp7ZEyEdg.png 514w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 257px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*xjarhfIDtRcaNhp7ZEyEdg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*xjarhfIDtRcaNhp7ZEyEdg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*xjarhfIDtRcaNhp7ZEyEdg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*xjarhfIDtRcaNhp7ZEyEdg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*xjarhfIDtRcaNhp7ZEyEdg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*xjarhfIDtRcaNhp7ZEyEdg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:514\/1*xjarhfIDtRcaNhp7ZEyEdg.png 514w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 257px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:576\/1*8BQhdKu1nk-tAAbOR17qGg.png\" alt=\"\" width=\"576\" height=\"360\"><\/figure><div class=\"nc nd pz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1152\/format:webp\/1*8BQhdKu1nk-tAAbOR17qGg.png 1152w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 576px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*8BQhdKu1nk-tAAbOR17qGg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*8BQhdKu1nk-tAAbOR17qGg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*8BQhdKu1nk-tAAbOR17qGg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*8BQhdKu1nk-tAAbOR17qGg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*8BQhdKu1nk-tAAbOR17qGg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*8BQhdKu1nk-tAAbOR17qGg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1152\/1*8BQhdKu1nk-tAAbOR17qGg.png 1152w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 576px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Plot of MAE Loss (Y-axis) vs. Predictions (X-axis)<\/figcaption>\n<\/figure>\n<h2 id=\"dd2a\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">MSE vs. MAE (L2 loss vs L1 loss)<\/strong><\/h2>\n<p id=\"4baa\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">In short,<\/strong> <strong class=\"be na\">using the squared error is easier to solve, but using the absolute error is more robust to outliers. But let\u2019s understand why!<\/strong><\/p>\n<p id=\"1057\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Whenever we train a machine learning model, our goal is to find the point that minimizes loss function. Of course, both functions reach the minimum when the prediction is exactly equal to the true value.<\/p>\n<p id=\"ee8b\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Here\u2019s a quick review of python code for both. We can either write our own functions or use sklearn\u2019s built-in metrics functions:<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5921\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/mas-\u2013-Medium.jpg\" alt=\"\" width=\"520\" height=\"255\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/mas-\u2013-Medium.jpg 520w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/mas-\u2013-Medium-300x147.jpg 300w\" sizes=\"auto, (max-width: 520px) 100vw, 520px\" \/><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p id=\"7a89\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Let\u2019s see the values of MAE and Root Mean Square Error (RMSE, which is just the square root of MSE to make it on the same scale as MAE) for 2 cases. In the first case, the predictions are close to true values and the error has small variance among observations. In the second, there is one outlier observation, and the error is high.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*KibGRET1M6Bu0-8XmjviMA.png\" alt=\"\" width=\"700\" height=\"231\"><\/figure><div class=\"nc nd qj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*KibGRET1M6Bu0-8XmjviMA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*KibGRET1M6Bu0-8XmjviMA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*KibGRET1M6Bu0-8XmjviMA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*KibGRET1M6Bu0-8XmjviMA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*KibGRET1M6Bu0-8XmjviMA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*KibGRET1M6Bu0-8XmjviMA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*KibGRET1M6Bu0-8XmjviMA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*KibGRET1M6Bu0-8XmjviMA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\"><strong class=\"be na\">Left:<\/strong> Errors are close to each other <strong class=\"be na\">Right: <\/strong>One error is way off as compared to others<\/figcaption>\n<\/figure>\n<p id=\"b533\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">What do we observe from this, and how can it help us to choose which loss function to use?<\/strong><\/p>\n<p id=\"4bc2\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Since MSE squares the error (y \u2014 y_predicted = e), the value of error (e) increases a lot if e &gt; 1. If we have an outlier in our data, the value of e will be high and e\u00b2 will be &gt;&gt; |e|. This will make the model with MSE loss give more weight to outliers than a model with MAE loss. In the 2nd case above, the model with RMSE as loss will be adjusted to minimize that single outlier case at the expense of other common examples, which will reduce its overall performance.<\/p>\n<p id=\"267b\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">MAE loss is useful<\/strong> if the training data is corrupted with outliers (i.e. we erroneously receive unrealistically huge negative\/positive values in our training environment, but not our testing environment).<\/p>\n<p id=\"3a64\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Intuitively, we can think about it like this: If we only had to give one prediction for all the observations that try to minimize MSE, then that prediction should be the <strong class=\"be na\">mean <\/strong>of all target values. But if we try to minimize MAE, that prediction would be the <strong class=\"be na\">median<\/strong> of all observations. We know that median is more <a class=\"af nb\" href=\"https:\/\/medium.com\/r?url=https%3A%2F%2Fheartbeat.fritz.ai%2Fhow-to-make-your-machine-learning-models-robust-to-outliers-44d404067d07\" rel=\"noopener\">robust to outliers<\/a> than mean, which consequently makes MAE more robust to outliers than MSE.<\/p>\n<p id=\"6973\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">One big problem in using MAE loss<\/strong> (for neural nets especially) is that its gradient is the same throughout, which means the gradient will be large even for small loss values. This isn\u2019t good for learning. To fix this, we can use dynamic learning rate which decreases as we move closer to the minima. MSE behaves nicely in this case and will converge even with a fixed learning rate. The gradient of MSE loss is high for larger loss values and decreases as loss approaches 0, making it more precise at the end of training (see figure below.)<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*JTC4ReFwSeAt3kvTLq1YoA.png\" alt=\"\" width=\"700\" height=\"218\"><\/figure><div class=\"nc nd qo\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*JTC4ReFwSeAt3kvTLq1YoA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*JTC4ReFwSeAt3kvTLq1YoA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*JTC4ReFwSeAt3kvTLq1YoA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*JTC4ReFwSeAt3kvTLq1YoA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*JTC4ReFwSeAt3kvTLq1YoA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*JTC4ReFwSeAt3kvTLq1YoA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*JTC4ReFwSeAt3kvTLq1YoA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*JTC4ReFwSeAt3kvTLq1YoA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"cd43\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Deciding which loss function to use<br>\n<\/strong>If the outliers represent anomalies that are important for business and should be detected, then we should use MSE. On the other hand, if we believe that the outliers just represent corrupted data, then we should choose MAE as loss.<\/p>\n<p id=\"b2aa\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">I recommend reading this post with a nice study <a class=\"af nb\" href=\"http:\/\/rishy.github.io\/ml\/2015\/07\/28\/l1-vs-l2-loss\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">comparing the performance of a regression model using L1 loss and L2 loss<\/a> in both the presence and absence of outliers. Remember, L1 and L2 loss are just another names for MAE and MSE respectively.<\/p>\n<blockquote class=\"nn\"><p id=\"ede8\" class=\"no np fo be nq nr qp qq qr qs qt mz dv\" data-selectable-paragraph=\"\"><mark class=\"wu wv ao\"><em class=\"nx\">L1 loss is more robust to outliers, but its derivatives are not continuous, making it inefficient to find the solution. L2 loss is sensitive to outliers, but gives a more stable and closed form solution (by setting its derivative to 0.)<\/em><\/mark><\/p><\/blockquote>\n<p id=\"8edc\" class=\"pw-post-body-paragraph mf mg fo be b gm qu mi mj gp qv ml mm mn qw mp mq mr qx mt mu mv qy mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Problems with both: <\/strong>There can be cases where neither loss function gives desirable predictions. For example, if 90% of observations in our data have true target value of 150 and the remaining 10% have target value between 0\u201330. Then a model with MAE as loss might predict 150 for all observations, ignoring 10% of outlier cases, as it will try to go towards median value. In the same case, a model using MSE would give many predictions in the range of 0 to 30 as it will get skewed towards outliers. Both results are undesirable in many business cases.<\/p>\n<p id=\"a19b\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">What to do in such a case? <\/strong>An easy fix would be to transform the target variables. Another way is to try a different loss function. This is the motivation behind our 3rd loss function, Huber loss.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"a1a5\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\">3. <strong class=\"al\">Huber Loss, Smooth Mean Absolute Error<\/strong><\/h2>\n<p id=\"235f\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\"><a class=\"af nb\" href=\"https:\/\/en.wikipedia.org\/wiki\/Huber_loss\" target=\"_blank\" rel=\"noopener ugc nofollow\">Huber loss<\/a> is less sensitive to outliers in data than the squared error loss. It\u2019s also differentiable at 0. It\u2019s basically absolute error, which becomes quadratic when error is small. How small that error has to be to make it quadratic depends on a hyperparameter, \ud835\udeff (delta), which can be tuned. Huber loss approaches <strong class=\"be na\">MSE when \ud835\udeff ~ 0 and MAE when \ud835\udeff ~ \u221e (large numbers.)<\/strong><\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:525\/1*0eoiZGyddDqltzzjoyfRzA.png\" alt=\"\" width=\"525\" height=\"83\"><\/figure><div class=\"nc nd qz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1050\/format:webp\/1*0eoiZGyddDqltzzjoyfRzA.png 1050w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 525px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*0eoiZGyddDqltzzjoyfRzA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*0eoiZGyddDqltzzjoyfRzA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*0eoiZGyddDqltzzjoyfRzA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*0eoiZGyddDqltzzjoyfRzA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*0eoiZGyddDqltzzjoyfRzA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*0eoiZGyddDqltzzjoyfRzA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1050\/1*0eoiZGyddDqltzzjoyfRzA.png 1050w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 525px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:504\/1*jxidxadWSMLvwLDZz2mycg.png\" alt=\"\" width=\"504\" height=\"360\"><\/figure><div class=\"nc nd ra\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1008\/format:webp\/1*jxidxadWSMLvwLDZz2mycg.png 1008w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*jxidxadWSMLvwLDZz2mycg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*jxidxadWSMLvwLDZz2mycg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*jxidxadWSMLvwLDZz2mycg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*jxidxadWSMLvwLDZz2mycg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*jxidxadWSMLvwLDZz2mycg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*jxidxadWSMLvwLDZz2mycg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1008\/1*jxidxadWSMLvwLDZz2mycg.png 1008w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Plot of Hoss Loss (Y-axis) vs. Predictions (X-axis). True value = 0<\/figcaption>\n<\/figure>\n<p id=\"7388\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">The choice of delta is critical because it determines what you\u2019re willing to consider as an outlier. Residuals larger than delta are minimized with L1 (which is less sensitive to large outliers), while residuals smaller than delta are minimized \u201cappropriately\u201d with L2.<\/p>\n<p id=\"e17d\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Why use Huber Loss?<br>\n<\/strong>One big problem with using MAE for training of neural nets is its constantly large gradient, which can lead to missing minima at the end of training using gradient descent. For MSE, gradient decreases as the loss gets close to its minima, making it more precise.<\/p>\n<p id=\"c3b9\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Huber loss can be really helpful in such cases, as it curves around the minima which decreases the gradient. And it\u2019s more robust to outliers than MSE. Therefore, it combines good properties from both MSE and MAE. However, the <strong class=\"be na\">problem with Huber loss<\/strong> is that we might need to train hyperparameter delta which is an iterative process.<\/p>\n<h2 id=\"95da\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\">4. Log-Cosh Loss<\/h2>\n<p id=\"994f\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\">Log-cosh is another function used in regression tasks that\u2019s smoother than L2. Log-cosh is the logarithm of the hyperbolic cosine of the prediction error.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:436\/1*hj5n5273jYX7rclO7bnfJg.png\" alt=\"\" width=\"436\" height=\"90\"><\/figure><div class=\"nc nd rb\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:872\/format:webp\/1*hj5n5273jYX7rclO7bnfJg.png 872w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 436px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*hj5n5273jYX7rclO7bnfJg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*hj5n5273jYX7rclO7bnfJg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*hj5n5273jYX7rclO7bnfJg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*hj5n5273jYX7rclO7bnfJg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*hj5n5273jYX7rclO7bnfJg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*hj5n5273jYX7rclO7bnfJg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:872\/1*hj5n5273jYX7rclO7bnfJg.png 872w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 436px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:504\/1*BAbgW_JdwyAWLZR2dE1Ujg.png\" alt=\"\" width=\"504\" height=\"360\"><\/figure><div class=\"nc nd ra\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1008\/format:webp\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 1008w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1008\/1*BAbgW_JdwyAWLZR2dE1Ujg.png 1008w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Plot of Log-cosh Loss (Y-axis) vs. Predictions (X-axis). True value = 0<\/figcaption>\n<\/figure>\n<p id=\"0b08\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Advantage:<\/strong> <code class=\"cw rc rd re rf b\">log(cosh(x))<\/code> is approximately equal to <code class=\"cw rc rd re rf b\">(x ** 2) \/ 2<\/code> for small <code class=\"cw rc rd re rf b\">x<\/code>and to <code class=\"cw rc rd re rf b\">abs(x) - log(2)<\/code> for large <code class=\"cw rc rd re rf b\">x<\/code>. This means that &#8216;logcosh&#8217; works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction. It has all the advantages of Huber loss, and it\u2019s twice differentiable everywhere, unlike Huber loss.<\/p>\n<p id=\"33e4\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Why do we need a 2nd derivative? <\/strong>Many ML model implementations like <a class=\"af nb\" href=\"https:\/\/heartbeat.comet.ml\/boosting-your-machine-learning-models-using-xgboost-d2cabb3e948f\" target=\"_blank\" rel=\"noopener ugc nofollow\">XGBoost <\/a>use Newton\u2019s method to find the optimum, which is why the second derivative (Hessian) is needed. For ML frameworks like XGBoost, twice differentiable functions are more favorable.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*FNxOsZLqXVZNFOxGoG9A1Q.png\" alt=\"\" width=\"700\" height=\"268\"><\/figure><div class=\"nc nd rg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*FNxOsZLqXVZNFOxGoG9A1Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Objective function used in XgBoost. Notice dependency on both 1st and 2nd order derivative<\/figcaption>\n<\/figure>\n<p id=\"72f9\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">But Log-cosh loss isn\u2019t perfect. It still suffers from the problem of gradient and hessian for very large off-target predictions being constant, therefore resulting in the absence of splits for XGBoost.<\/p>\n<p id=\"713b\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Python code for Huber and Log-cosh loss functions:<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"608\" height=\"133\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/loss_regression-\u2013-Medium-1.jpg\" alt=\"\" class=\"wp-image-5923\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/loss_regression-\u2013-Medium-1.jpg 608w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/loss_regression-\u2013-Medium-1-300x66.jpg 300w\" sizes=\"auto, (max-width: 608px) 100vw, 608px\" \/><\/figure>\n\n\n\n<div class=\"ab ca ny nz oa ob\" role=\"separator\"><\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"5c02\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\">5. Quantile Loss<\/h2>\n<p id=\"147d\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\">In most of the real-world prediction problems, we are often interested to know about the uncertainty in our predictions. Knowing about the range of predictions as opposed to only point estimates can significantly improve decision making processes for many business problems.<\/p>\n<p id=\"846e\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><a class=\"af nb\" href=\"https:\/\/towardsdatascience.com\/deep-quantile-regression-c85481548b5a\" target=\"_blank\" rel=\"noopener\">Quantile loss functions<\/a> turn out to be useful when we are interested in predicting an interval instead of only point predictions. Prediction interval from least square regression is based on an assumption that residuals (y \u2014 y_hat) have constant variance across values of independent variables. We can not trust linear regression models that violate this assumption. We can not also just throw away the idea of fitting a linear regression model as the baseline by saying that such situations would always be better modeled using non-linear functions or tree-based models. This is where quantile loss and quantile regression come to the rescue as regression-based on quantile loss provides sensible prediction intervals even for residuals with non-constant variance or non-normal distribution.<\/p>\n<p id=\"5cce\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Let\u2019s see a working example to better understand why regression based on quantile loss performs well with heteroscedastic data.<\/p>\n<p id=\"c355\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Quantile regression vs. Ordinary Least Square regression<\/strong><\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*A61Xn0hlPcoMKDns5KFD-A.png\" alt=\"\" width=\"700\" height=\"318\"><\/figure><div class=\"nc nd rh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*A61Xn0hlPcoMKDns5KFD-A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*A61Xn0hlPcoMKDns5KFD-A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*A61Xn0hlPcoMKDns5KFD-A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*A61Xn0hlPcoMKDns5KFD-A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*A61Xn0hlPcoMKDns5KFD-A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*A61Xn0hlPcoMKDns5KFD-A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*A61Xn0hlPcoMKDns5KFD-A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*A61Xn0hlPcoMKDns5KFD-A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Left: Linear relationship b\/w X1 and Y. With a constant variance of residuals. Right: Linear relationship b\/w X2 and Y but the variance of Y increases with X2. (Heteroscedasticity)<\/figcaption>\n<\/figure>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*h_iOn3gSUa2bk6o0foudDA.png\" alt=\"\" width=\"700\" height=\"318\"><\/figure><div class=\"nc nd rh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*h_iOn3gSUa2bk6o0foudDA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*h_iOn3gSUa2bk6o0foudDA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*h_iOn3gSUa2bk6o0foudDA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*h_iOn3gSUa2bk6o0foudDA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*h_iOn3gSUa2bk6o0foudDA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*h_iOn3gSUa2bk6o0foudDA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*h_iOn3gSUa2bk6o0foudDA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*h_iOn3gSUa2bk6o0foudDA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">The orange line represents OLS estimates for both cases<\/figcaption>\n<\/figure>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:568\/1*hdqrLhTXity54wmfXAtBGw.png\" alt=\"\" width=\"568\" height=\"424\"><\/figure><div class=\"nc nd ri\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1136\/format:webp\/1*hdqrLhTXity54wmfXAtBGw.png 1136w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 568px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*hdqrLhTXity54wmfXAtBGw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*hdqrLhTXity54wmfXAtBGw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*hdqrLhTXity54wmfXAtBGw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*hdqrLhTXity54wmfXAtBGw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*hdqrLhTXity54wmfXAtBGw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*hdqrLhTXity54wmfXAtBGw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1136\/1*hdqrLhTXity54wmfXAtBGw.png 1136w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 568px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Quantile Regression. Dotted lines represent regression-based 0.05 and 0.95 quantile loss functions<\/figcaption>\n<\/figure>\n<p id=\"91c4\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Notebook <a class=\"af nb\" href=\"https:\/\/github.com\/groverpr\/Machine-Learning\/blob\/master\/notebooks\/09_Quantile_Regression.ipynb\" target=\"_blank\" rel=\"noopener ugc nofollow\">link<\/a> with codes for quantile regression shown in the above plots.<\/p>\n<p id=\"4b46\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Understanding the quantile loss function<\/strong><\/p>\n<p id=\"9be6\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">Quantile-based regression aims to estimate the conditional \u201cquantile\u201d of a response variable given certain values of predictor variables. Quantile loss is actually just an extension of MAE (when the quantile is 50th percentile, it is MAE).<\/p>\n<p id=\"7224\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">The idea is to choose the quantile value based on whether we want to give more value to positive errors or negative errors. Loss function tries to give different penalties to overestimation and underestimation based on the value of the chosen quantile (\u03b3). For example, a quantile loss function of \u03b3 = 0.25 gives more penalty to overestimation and tries to keep prediction values a little below median<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*ePh5hyrWS5f591nUORz8_A.png\" alt=\"\" width=\"700\" height=\"97\"><\/figure><div class=\"nc nd rj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*ePh5hyrWS5f591nUORz8_A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*ePh5hyrWS5f591nUORz8_A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*ePh5hyrWS5f591nUORz8_A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*ePh5hyrWS5f591nUORz8_A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*ePh5hyrWS5f591nUORz8_A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*ePh5hyrWS5f591nUORz8_A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*ePh5hyrWS5f591nUORz8_A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*ePh5hyrWS5f591nUORz8_A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"e20c\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">\u03b3 is the required quantile and has value between 0 and 1.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:504\/1*_Msrko0NVv1d43MaVfsZkA.png\" alt=\"\" width=\"504\" height=\"360\"><\/figure><div class=\"nc nd ra\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1008\/format:webp\/1*_Msrko0NVv1d43MaVfsZkA.png 1008w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*_Msrko0NVv1d43MaVfsZkA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*_Msrko0NVv1d43MaVfsZkA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*_Msrko0NVv1d43MaVfsZkA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*_Msrko0NVv1d43MaVfsZkA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*_Msrko0NVv1d43MaVfsZkA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*_Msrko0NVv1d43MaVfsZkA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1008\/1*_Msrko0NVv1d43MaVfsZkA.png 1008w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Plot of Quantile Loss (Y-axis) vs. Predictions (X-axis). True value of Y = 0<\/figcaption>\n<\/figure>\n<p id=\"2f4b\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">We can also use this loss function to calculate prediction intervals in neural nets or tree based models. Below is an example of Sklearn implementation for gradient boosted tree regressors.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*DQ0t4YXq-xLFsWi1.png\" alt=\"\" width=\"640\" height=\"480\"><\/figure><div class=\"nc nd rk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/format:webp\/0*DQ0t4YXq-xLFsWi1.png 1280w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*DQ0t4YXq-xLFsWi1.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*DQ0t4YXq-xLFsWi1.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*DQ0t4YXq-xLFsWi1.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*DQ0t4YXq-xLFsWi1.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*DQ0t4YXq-xLFsWi1.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*DQ0t4YXq-xLFsWi1.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/0*DQ0t4YXq-xLFsWi1.png 1280w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\">Prediction Intervals using Quantile loss (Gradient Boosting Regressor) <a class=\"af nb\" href=\"http:\/\/scikit-learn.org\/stable\/auto_examples\/ensemble\/plot_gradient_boosting_quantile.html\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"nx\">http:\/\/scikit-learn.org\/stable\/auto_examples\/ensemble\/plot_gradient_boosting_quantile.html<\/em><\/a><\/figcaption>\n<\/figure>\n<p id=\"b67a\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">The above figure shows a 90% prediction interval calculated using the quantile loss function available in GradientBoostingRegression of sklearn library. The upper bound is constructed \u03b3 = 0.95 and lower bound using \u03b3 = 0.05.<\/p>\n<p id=\"5908\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">___________________________________________________________________<\/p>\n<h2 id=\"f90a\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Comparison Study:<\/strong><\/h2>\n<p id=\"acac\" class=\"pw-post-body-paragraph mf mg fo be b gm pt mi mj gp pu ml mm mn pv mp mq mr pw mt mu mv px mx my mz fh bj\" data-selectable-paragraph=\"\">A nice comparison simulation is provided in \u201c<a class=\"af nb\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3885826\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Gradient boosting machines, a tutorial<\/a>\u201d. To demonstrate the properties of all the above loss functions, they\u2019ve simulated a dataset sampled from a <a class=\"af nb\" href=\"https:\/\/en.wikipedia.org\/wiki\/Sinc_function\" target=\"_blank\" rel=\"noopener ugc nofollow\">sinc(<em class=\"rl\">x<\/em>)<\/a> function with two sources of artificially simulated noise: the Gaussian noise component \u03b5 ~ <em class=\"rl\">N<\/em>(0, \u03c32) and the impulsive noise component \u03be ~ Bern(<em class=\"rl\">p<\/em>). The impulsive noise term is added to illustrate the robustness effects. Below are the results of fitting a GBM regressor using different loss functions.<\/p>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*46WnlaWhfPZaVWzSZPviIg.png\" alt=\"\" width=\"700\" height=\"298\"><\/figure><div class=\"nc nd rm\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*46WnlaWhfPZaVWzSZPviIg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*46WnlaWhfPZaVWzSZPviIg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*46WnlaWhfPZaVWzSZPviIg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*46WnlaWhfPZaVWzSZPviIg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*46WnlaWhfPZaVWzSZPviIg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*46WnlaWhfPZaVWzSZPviIg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*46WnlaWhfPZaVWzSZPviIg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*46WnlaWhfPZaVWzSZPviIg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"qa qb qc nc nd qd qe be b bf z dv\" data-selectable-paragraph=\"\"><strong class=\"be na\">Continuous loss functions: (A) MSE loss function; (B) MAE loss function; (C) Huber loss function; (D) Quantile loss function<\/strong>. Demonstration of fitting a smooth GBM to a noisy sinc(<em class=\"nx\">x<\/em>) data: <strong class=\"be na\">(E) <\/strong>original sinc(<em class=\"nx\">x<\/em>) function; <strong class=\"be na\">(F)<\/strong>smooth GBM fitted with MSE and MAE loss; <strong class=\"be na\">(G)<\/strong> smooth GBM fitted with Huber loss with \u03b4 = {4, 2, 1}; <strong class=\"be na\">(H)<\/strong>smooth GBM fitted with Quantile loss with \u03b1 = {0.5, 0.1, 0.9}.<\/figcaption>\n<\/figure>\n<p id=\"9697\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">Some observations from the simulations:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"38eb\" class=\"mf mg fo be b gm mh mi mj gp mk ml mm rn mo mp mq ro ms mt mu rp mw mx my mz rq rr rs bj\" data-selectable-paragraph=\"\">The predictions from the model with MAE loss are less affected by the impulsive noise whereas the predictions with MSE loss function are slightly biased due to the caused deviations.<\/li>\n<li id=\"684d\" class=\"mf mg fo be b gm rt mi mj gp ru ml mm rn rv mp mq ro rw mt mu rp rx mx my mz rq rr rs bj\" data-selectable-paragraph=\"\">The predictions are little sensitive to the value of hyperparameter chosen in the case of the model with Huber loss.<\/li>\n<li id=\"3e92\" class=\"mf mg fo be b gm rt mi mj gp ru ml mm rn rv mp mq ro rw mt mu rp rx mx my mz rq rr rs bj\" data-selectable-paragraph=\"\">The quantile losses give a good estimation of the corresponding confidence levels.<\/li>\n<\/ul>\n<h2 id=\"234f\" class=\"pc oh fo be oi pd pe pf ol pg ph pi oo mn pj pk pl mr pm pn po mv pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">All the loss functions in a single plot.<\/strong><\/h2>\n<figure class=\"nf ng nh ni nj nk nc nd paragraph-image\">\n<div class=\"qk ql eb qm bg qn\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg nl nm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*BploIBOUrhbgdoB1BK_sOg.png\" alt=\"\" width=\"700\" height=\"455\"><\/figure><div class=\"nc nd ry\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*BploIBOUrhbgdoB1BK_sOg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*BploIBOUrhbgdoB1BK_sOg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*BploIBOUrhbgdoB1BK_sOg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*BploIBOUrhbgdoB1BK_sOg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*BploIBOUrhbgdoB1BK_sOg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*BploIBOUrhbgdoB1BK_sOg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*BploIBOUrhbgdoB1BK_sOg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*BploIBOUrhbgdoB1BK_sOg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"ab ca ny nz oa ob\" role=\"separator\"><\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"951f\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\">If I have missed any important loss functions, I would love to hear about them in the comments. Thank you for reading.<\/p>\n<p id=\"b8fb\" class=\"pw-post-body-paragraph mf mg fo be b gm mh mi mj gp mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz fh bj\" data-selectable-paragraph=\"\"><strong class=\"be na\">LinkedIn: <\/strong><a class=\"af nb\" href=\"https:\/\/www.linkedin.com\/in\/groverpr\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/www.linkedin.com\/in\/groverpr\/<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>All the algorithms in machine learning rely on minimizing or maximizing a function, which we call \u201cobjective function\u201d. The group of functions that are minimized are called \u201closs functions\u201d. A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome. A most commonly [&hellip;]<\/p>\n","protected":false},"author":35,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6],"tags":[],"coauthors":[146],"class_list":["post-5919","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>5 Regression Loss Functions All Machine Learners Should Know - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"5 Regression Loss Functions All Machine Learners Should Know\" \/>\n<meta property=\"og:description\" content=\"All the algorithms in machine learning rely on minimizing or maximizing a function, which we call \u201cobjective function\u201d. The group of functions that are minimized are called \u201closs functions\u201d. A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome. A most commonly [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-14T16:03:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:15:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png\" \/>\n<meta name=\"author\" content=\"Prince Grover\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Prince Grover\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"5 Regression Loss Functions All Machine Learners Should Know - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/","og_locale":"en_US","og_type":"article","og_title":"5 Regression Loss Functions All Machine Learners Should Know","og_description":"All the algorithms in machine learning rely on minimizing or maximizing a function, which we call \u201cobjective function\u201d. The group of functions that are minimized are called \u201closs functions\u201d. A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome. A most commonly [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-06-14T16:03:26+00:00","article_modified_time":"2025-04-24T17:15:29+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png","type":"","width":"","height":""}],"author":"Prince Grover","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Prince Grover","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/"},"author":{"name":"Prince Grover","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/4ae98a6ee1e200a08b70496eedbf588e"},"headline":"5 Regression Loss Functions All Machine Learners Should Know","datePublished":"2023-06-14T16:03:26+00:00","dateModified":"2025-04-24T17:15:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/"},"wordCount":2247,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/","url":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/","name":"5 Regression Loss Functions All Machine Learners Should Know - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png","datePublished":"2023-06-14T16:03:26+00:00","dateModified":"2025-04-24T17:15:29+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:486\/1*3MsFzl7zRZE3TihIC9JmaQ.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/5-regression-loss-functions-all-machine-learners-should-know\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"5 Regression Loss Functions All Machine Learners Should Know"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/4ae98a6ee1e200a08b70496eedbf588e","name":"Prince Grover","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/f71805e860c90311602e04a6682da4d4","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/j_dmfcP__400x400-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/j_dmfcP__400x400-96x96.jpg","caption":"Prince Grover"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/prince-grover\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5919","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=5919"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5919\/revisions"}],"predecessor-version":[{"id":15618,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5919\/revisions\/15618"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=5919"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=5919"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=5919"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=5919"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}