{"id":8419,"date":"2023-12-12T15:02:07","date_gmt":"2023-12-12T23:02:07","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8419"},"modified":"2025-04-24T17:03:53","modified_gmt":"2025-04-24T17:03:53","slug":"evaluation-metrics-for-classification-models-in-machine-learning-part-1","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1\/","title":{"rendered":"Evaluation Metrics for Classification Models in Machine Learning (Part 1)"},"content":{"rendered":"\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<figure class=\"mr ms mt mu mv mw mo mp paragraph-image\">\n<div class=\"mx my ee mz bg na\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lw nb c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp\" alt=\"black boots on gray cement with arrows going left and right diagonal\" width=\"700\" height=\"934\"><\/figure><div class=\"mo mp mq\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"nc nd ne mo mp nf ng be b bf z dw\" data-selectable-paragraph=\"\">Photo by <a href=\"https:\/\/unsplash.com\/@jontyson?utm_source=medium&amp;utm_medium=referral\">Jon Tyson<\/a> on <a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"9244\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Suppose you are working on a machine learning classification problem in which you have to predict whether a person is Covid positive or negative. You have a good dataset and you have applied classification algorithms and successfully built your classification model.<\/p>\n<p id=\"8924\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Now what should you do? How do you evaluate the performance of your classification model? How do you know how good the model is and whether the predictions are correct or not? The simple answer to this question is <strong class=\"be od\">evaluation metrics<\/strong>.<\/p>\n<p id=\"2f93\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Evaluation metrics are used to measure the performance of the machine learning model. Evaluating a machine learning model is an essential part of the data science pipeline. There are different types of evaluation metrics available to test the efficiency of the model such as confusion matrix, classification accuracy, loss, and others.<\/p>\n<p id=\"cc2a\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">These evaluation metrics play an important role in ensuring that the machine learning model is working optimally and correctly. This series will discuss various evaluation metrics and how and when we should use them.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"f812\" class=\"om on fr be oo op oq gr or os ot gu ou ov ow ox oy oz pa pb pc pd pe pf pg ph bj\">Confusion Matrix<\/h2>\n<p id=\"d2a4\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">In machine learning, a confusion matrix is a technique to visualize the performance of a classification model on the set of our test data. Calculating a confusion matrix gives us a proper idea of how many values are being predicted correctly and how many are being predicted incorrectly by our classification model. This matrix also lets us know what types of errors the classification model makes.<\/p>\n<p id=\"1e53\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">This is a confusion matrix:<\/p>\n<figure class=\"mr ms mt mu mv mw mo mp paragraph-image\">\n<div class=\"mx my ee mz bg na\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lw nb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*pPdjpaOQFcALkf_T.png\" alt=\"\" width=\"700\" height=\"496\"><\/figure><div class=\"mo mp pn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*pPdjpaOQFcALkf_T.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*pPdjpaOQFcALkf_T.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*pPdjpaOQFcALkf_T.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*pPdjpaOQFcALkf_T.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*pPdjpaOQFcALkf_T.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*pPdjpaOQFcALkf_T.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/0*pPdjpaOQFcALkf_T.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*pPdjpaOQFcALkf_T.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*pPdjpaOQFcALkf_T.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*pPdjpaOQFcALkf_T.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*pPdjpaOQFcALkf_T.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*pPdjpaOQFcALkf_T.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*pPdjpaOQFcALkf_T.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*pPdjpaOQFcALkf_T.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"nc nd ne mo mp nf ng be b bf z dw\" data-selectable-paragraph=\"\">Binary Classification Confusion Matrix<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"7bcd\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">The confusion matrix contains four types of outcomes. These outcomes are:<\/p>\n<h2 id=\"0acd\" class=\"po on fr be oo pp pq pr or ps pt pu ou nq pv pw px nu py pz qa ny qb qc qd qe bj\" data-selectable-paragraph=\"\">1. True Positive (TP):<\/h2>\n<p id=\"58e8\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">A true positive is an outcome where the model correctly predicts the positive class. In our Covid example, a person who is classified as Covid-positive by the model and is actually positive is the true positive outcome.<\/p>\n<h2 id=\"dab9\" class=\"po on fr be oo pp pq pr or ps pt pu ou nq pv pw px nu py pz qa ny qb qc qd qe bj\" data-selectable-paragraph=\"\">2. True Negative (TN):<\/h2>\n<p id=\"5a5e\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">A true negative is an outcome where the model correctly predicts the negative class. In our Covid example, a person who is classified as Covid-negative by the model and is actually covid negative, which is a true negative outcome.<\/p>\n<h2 id=\"a986\" class=\"po on fr be oo pp pq pr or ps pt pu ou nq pv pw px nu py pz qa ny qb qc qd qe bj\" data-selectable-paragraph=\"\">3. False Positive (FP):<\/h2>\n<p id=\"5337\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">A false positive is an outcome where the model incorrectly predicts the positive class. In this outcome, the model classifies the outcome as positive but it actually belongs to the negative class.<\/p>\n<p id=\"c497\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">In our Covid example, a person who is classified as Covid-positive (positive class) by the model but is actually Covid-negative is a false positive outcome.<\/p>\n<h2 id=\"5af8\" class=\"po on fr be oo pp pq pr or ps pt pu ou nq pv pw px nu py pz qa ny qb qc qd qe bj\" data-selectable-paragraph=\"\">4. False Negative (FN):<\/h2>\n<p id=\"9bca\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">A false negative is an outcome where the model incorrectly predicts the negative class. In this outcome, the model classifies the outcome as negative but it actually belongs to the positive class.<\/p>\n<p id=\"31b8\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">In our example, a person who is classified as Covid-negative (negative class) by the model but is actually Covid-positive is a false negative outcome.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<blockquote class=\"qf\"><p id=\"0987\" class=\"qg qh fr be qi qj qk ql qm qn qo oc dw\" data-selectable-paragraph=\"\">Join 16,000 of your colleagues at <a href=\"https:\/\/www.deeplearningweekly.com\/about\">Deep Learning Weekly<\/a> for the latest products, acquisitions, technologies, deep-dives and more.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"64c2\" class=\"om on fr be oo op oq gr or os ot gu ou ov ow ox oy oz pa pb pc pd pe pf pg ph bj\">Accuracy<\/h2>\n<p id=\"c1f8\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">In machine learning, accuracy is one of the metrics for evaluating classification models. It is the fraction of predictions that our model classified correctly. Accuracy is calculated as the number of correct predictions divided by the total number of observations in the dataset. It has the following definition:<\/p>\n<figure class=\"mr ms mt mu mv mw mo mp paragraph-image\">\n<div class=\"mx my ee mz bg na\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lw nb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Xqjm0shXdBqt1jWDO1Hszg.png\" alt=\"\" width=\"700\" height=\"189\"><\/figure><div class=\"mo mp qp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Xqjm0shXdBqt1jWDO1Hszg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Xqjm0shXdBqt1jWDO1Hszg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Xqjm0shXdBqt1jWDO1Hszg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Xqjm0shXdBqt1jWDO1Hszg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Xqjm0shXdBqt1jWDO1Hszg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Xqjm0shXdBqt1jWDO1Hszg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Xqjm0shXdBqt1jWDO1Hszg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Xqjm0shXdBqt1jWDO1Hszg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"nc nd ne mo mp nf ng be b bf z dw\" data-selectable-paragraph=\"\">Image by author<\/figcaption>\n<\/figure>\n<p id=\"caf8\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">For binary classification we can calculate accuracy in terms of positive and negative outcomes in the following way:<\/p>\n<figure class=\"mr ms mt mu mv mw mo mp paragraph-image\">\n<div class=\"mx my ee mz bg na\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lw nb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*YZwqYSyDVUJTAwSiOGSqnQ.png\" alt=\"\" width=\"700\" height=\"189\"><\/figure><div class=\"mo mp qp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*YZwqYSyDVUJTAwSiOGSqnQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"nc nd ne mo mp nf ng be b bf z dw\" data-selectable-paragraph=\"\">Image by author<\/figcaption>\n<\/figure>\n<p id=\"0024\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Accuracy is a good choice for the evaluation of classification models where the dataset is well balanced and there is no class imbalance in the data and the dataset is not skewed as well.<\/p>\n<h2 id=\"288b\" class=\"om on fr be oo op qq gr or os qr gu ou ov qs ox oy oz qt pb pc pd qu pf pg ph bj\">Precision<\/h2>\n<p id=\"6905\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">This evaluation metric is used to tell what fraction of positive predictions were actually positive. In simple terms, precision is the ratio between true positive (TP) and all the positive predictions.<\/p>\n<p id=\"36fc\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">It attempts to answer the following question:<\/p>\n<pre class=\"mr ms mt mu mv qv qw qx qy ax qz bj\"><span id=\"985c\" class=\"po on fr qw b ic ra rb l is rc\" data-selectable-paragraph=\"\">What proportion of positive identifications was actually correct?<\/span><\/pre>\n<p id=\"4146\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">It is calculated as the number of correct positive predictions divided by the total number of positive predictions. It has the following definition:<\/p>\n<figure class=\"mr ms mt mu mv mw mo mp paragraph-image\">\n<div class=\"mx my ee mz bg na\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lw nb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*QK94JX8oQFxfi9bLWp2NFA.png\" alt=\"\" width=\"700\" height=\"189\"><\/figure><div class=\"mo mp qp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*QK94JX8oQFxfi9bLWp2NFA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*QK94JX8oQFxfi9bLWp2NFA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*QK94JX8oQFxfi9bLWp2NFA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*QK94JX8oQFxfi9bLWp2NFA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*QK94JX8oQFxfi9bLWp2NFA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*QK94JX8oQFxfi9bLWp2NFA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*QK94JX8oQFxfi9bLWp2NFA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*QK94JX8oQFxfi9bLWp2NFA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"nc nd ne mo mp nf ng be b bf z dw\" data-selectable-paragraph=\"\">Image by author<\/figcaption>\n<\/figure>\n<p id=\"f870\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Precision is a good choice for the evaluation of classification models when we want to be very sure of our prediction. We need to use precision in such cases where we do not care about false negatives but our focus is mainly on true positive and false-positive outcomes.<\/p>\n<p id=\"15ab\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">For example, in spam email detection, the use of this precision metric is recommended. It is acceptable if the spam mail is identified as not spam and comes into the inbox folder, but any important mail should not go into the spam folder.<\/p>\n<h2 id=\"d7db\" class=\"om on fr be oo op qq gr or os qr gu ou ov qs ox oy oz qt pb pc pd qu pf pg ph bj\">Recall<\/h2>\n<p id=\"d9f0\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">Recall, also known as <strong class=\"be od\">Sensitivity <\/strong>or<strong class=\"be od\"> True Positive Rate, <\/strong>is used to tell what fraction of all positive observations were correctly predicted as positive by the classifier.<\/p>\n<p id=\"d41c\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">It attempts to answer the following question:<\/p>\n<pre class=\"mr ms mt mu mv qv qw qx qy ax qz bj\"><span id=\"110d\" class=\"po on fr qw b ic ra rb l is rc\" data-selectable-paragraph=\"\">What proportion of actual positives was identified correctly?<\/span><\/pre>\n<p id=\"1ce3\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">It is calculated as the number of correct positive predictions (TP) divided by the total number of positives. It has the following definition:<\/p>\n<figure class=\"mr ms mt mu mv mw mo mp paragraph-image\">\n<div class=\"mx my ee mz bg na\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lw nb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png\" alt=\"\" width=\"700\" height=\"189\"><\/figure><div class=\"mo mp qp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*LzYIe9aJZ3oZ-DfJdCyR1w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"87f6\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Recall is a good choice for the evaluation of classification models when we need output-sensitive predictions and high cost is associated with false negatives. For example, when predicting whether a person has cancer or not, we need to cover both true positive and false negatives. If a person with cancer (Actual Positive) goes for the tests and the result is predicted as no cancer (Predicted Negative), this is a <strong class=\"be od\">false negative<\/strong> outcome. The cost associated with false negative is high as it may be hazardous for a patient\u2019s life.<\/p>\n<h2 id=\"e626\" class=\"om on fr be oo op qq gr or os qr gu ou ov qs ox oy oz qt pb pc pd qu pf pg ph bj\">Conclusion<\/h2>\n<p id=\"4d5d\" class=\"pw-post-body-paragraph ni nj fr be b gp pi nl nm gs pj no np nq pk ns nt nu pl nw nx ny pm oa ob oc fk bj\" data-selectable-paragraph=\"\">Now we know what evaluation metrics are and which evaluation metric is used in what kind of scenario. In the next part we will discuss what kind of evaluation metrics are used for multi-class classification problems.<\/p>\n<p id=\"8197\" class=\"pw-post-body-paragraph ni nj fr be b gp nk nl nm gs nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc fk bj\" data-selectable-paragraph=\"\">Thanks for reading!<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Jon Tyson on Unsplash Suppose you are working on a machine learning classification problem in which you have to predict whether a person is Covid positive or negative. You have a good dataset and you have applied classification algorithms and successfully built your classification model. Now what should you do? How do you [&hellip;]<\/p>\n","protected":false},"author":88,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[185],"class_list":["post-8419","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Evaluation Metrics for Classification Models (Part 1)<\/title>\n<meta name=\"description\" content=\"In part one of this series, learn about various evaluation metrics for a classification model and how and when we should use them.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Evaluation Metrics for Classification Models in Machine Learning (Part 1)\" \/>\n<meta property=\"og:description\" content=\"In part one of this series, learn about various evaluation metrics for a classification model and how and when we should use them.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-12T23:02:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:03:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp\" \/>\n<meta name=\"author\" content=\"Pralabh Saxena\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pralabh Saxena\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Evaluation Metrics for Classification Models (Part 1)","description":"In part one of this series, learn about various evaluation metrics for a classification model and how and when we should use them.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1","og_locale":"en_US","og_type":"article","og_title":"Evaluation Metrics for Classification Models in Machine Learning (Part 1)","og_description":"In part one of this series, learn about various evaluation metrics for a classification model and how and when we should use them.","og_url":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-12-12T23:02:07+00:00","article_modified_time":"2025-04-24T17:03:53+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp","type":"","width":"","height":""}],"author":"Pralabh Saxena","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Pralabh Saxena","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1\/"},"author":{"name":"Pralabh Saxena","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/661df331deec9788343ef011c9467cc8"},"headline":"Evaluation Metrics for Classification Models in Machine Learning (Part 1)","datePublished":"2023-12-12T23:02:07+00:00","dateModified":"2025-04-24T17:03:53+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1\/"},"wordCount":939,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1\/","url":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1","name":"Evaluation Metrics for Classification Models (Part 1)","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp","datePublished":"2023-12-12T23:02:07+00:00","dateModified":"2025-04-24T17:03:53+00:00","description":"In part one of this series, learn about various evaluation metrics for a classification model and how and when we should use them.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*c6FT2rVO_bwSy1Zp"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-1#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Evaluation Metrics for Classification Models in Machine Learning (Part 1)"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/661df331deec9788343ef011c9467cc8","name":"Pralabh Saxena","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/af2f89cb395a3afe9b42605f70d9c6a7","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/09\/1689749938719-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/09\/1689749938719-96x96.jpg","caption":"Pralabh Saxena"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/pralabh-saxena2014gmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8419","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/88"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8419"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8419\/revisions"}],"predecessor-version":[{"id":15422,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8419\/revisions\/15422"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8419"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8419"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}