{"id":8422,"date":"2023-12-12T15:03:52","date_gmt":"2023-12-12T23:03:52","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8422"},"modified":"2025-04-24T17:03:52","modified_gmt":"2025-04-24T17:03:52","slug":"evaluation-metrics-for-classification-models-in-machine-learning-part-2","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2\/","title":{"rendered":"Evaluation Metrics for Classification Models in Machine Learning (Part 2)"},"content":{"rendered":"\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<figure class=\"mb mc md me mf mg ly lz paragraph-image\">\n<div class=\"mh mi ee mj bg mk\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lf ml c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84\" alt=\"black boots on gray cement, with arrows pointing diagonally left and right\" width=\"700\" height=\"934\"><\/figure><div class=\"ly lz ma\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mm mn mo ly lz mp mq be b bf z dw\" data-selectable-paragraph=\"\">Photo by <a class=\"af mr\" href=\"https:\/\/unsplash.com\/@jontyson?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Jon Tyson<\/a> on <a class=\"af mr\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"0b9c\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">In machine learning, data scientists use evaluation metrics to assess the model&#8217;s performance in terms of the ability of the various machine learning models to classify the data points into their respective classes accurately.<\/p>\n<p id=\"a1f8\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">As a data scientist, selecting the right evaluation metrics is essential based on the problem&#8217;s use case and the dataset&#8217;s characteristics.<\/p>\n<p id=\"99b0\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">These metrics may differ depending on the requirements of the problem. For example, recall may be more important than precision in a medical diagnosis scenario, as it is more important to avoid false negatives (missed diagnoses) than false positives (unnecessary treatments).<\/p>\n<p id=\"e1a0\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">In the <a class=\"af mr\" href=\"https:\/\/heartbeat.comet.ml\/evaluation-metrics-for-classification-models-in-machine-learning-part-1-24e1ed84670\" target=\"_blank\" rel=\"noopener ugc nofollow\">previous part<\/a> of this series, we learned about some of the evaluation metrics used for classification models and in what scenarios we should use those metrics.<\/p>\n<p id=\"769f\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">This article will review other useful evaluation metrics for classification models. Let&#8217;s get started!<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"ee07\" class=\"nw nx fr be ny nz oa gr ob oc od gu oe of og oh oi oj ok ol om on oo op oq or bj\">F1 Score<\/h2>\n<p id=\"e1d8\" class=\"pw-post-body-paragraph ms mt fr mu b gp os mw mx gs ot mz na nb ou nd ne nf ov nh ni nj ow nl nm nn fk bj\" data-selectable-paragraph=\"\">The F1 score is one of the most popular metrics for classification models. It is the harmonic mean of the model&#8217;s precision and recall and is a number that ranges between 0 and 1.<\/p>\n<p id=\"828b\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">F1 score can be calculated in the following way:<\/p>\n<figure class=\"mb mc md me mf mg ly lz paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg lf ml c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:376\/1*cdmfghgZw3d7-rm0YYZPAQ.png\" alt=\"\" width=\"376\" height=\"127\"><\/figure><div class=\"ly lz ox\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:752\/format:webp\/1*cdmfghgZw3d7-rm0YYZPAQ.png 752w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 376px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*cdmfghgZw3d7-rm0YYZPAQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*cdmfghgZw3d7-rm0YYZPAQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*cdmfghgZw3d7-rm0YYZPAQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*cdmfghgZw3d7-rm0YYZPAQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*cdmfghgZw3d7-rm0YYZPAQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*cdmfghgZw3d7-rm0YYZPAQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:752\/1*cdmfghgZw3d7-rm0YYZPAQ.png 752w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 376px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mm mn mo ly lz mp mq be b bf z dw\" data-selectable-paragraph=\"\">Image by Author<\/figcaption>\n<\/figure>\n<p id=\"bf63\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">The F1 score is helpful when precision and recall are essential, and the data is relatively balanced between the two classes. For example, it can be used to evaluate the performance of a fraud detection model, where both false positives and false negatives have serious consequences.<\/p>\n<p id=\"5c75\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\"><strong class=\"mu fs\">Example:<\/strong><\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"d7a0\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\">#F1 score <\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> f1_score\n\ny_true = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>]\ny_pred = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>]\n\nf1 = f1_score(y_true, y_pred)\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"F1 Score:\"<\/span>, f1)<\/span><\/pre>\n<h2 id=\"6c03\" class=\"nw nx fr be ny nz ph gr ob oc pi gu oe of pj oh oi oj pk ol om on pl op oq or bj\">Log Loss<\/h2>\n<p id=\"4c5c\" class=\"pw-post-body-paragraph ms mt fr mu b gp os mw mx gs ot mz na nb ou nd ne nf ov nh ni nj ow nl nm nn fk bj\" data-selectable-paragraph=\"\">Log loss (also called logarithmic loss or cross-entropy loss) measures the performance of a classification model where the prediction output is a probability value between 0 and 1. It compares the predicted probability distribution with the actual probability distribution of the test data. It&#8217;s defined as follows:<\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"56de\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\">log_loss = -1\/n * \u2211(y * <span class=\"hljs-built_in\">log<\/span>(y_hat) + (1-y) * <span class=\"hljs-built_in\">log<\/span>(1-y_hat))<\/span><\/pre>\n<p id=\"6a56\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">Where n is the number of samples, y is the true label, and y_hat is the predicted probability.<\/p>\n<p id=\"b301\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">Log loss is helpful when penalizing the model for being confidently wrong. It is commonly used in multi-class classification problems, where the output is a probability distribution over multiple classes.<\/p>\n<p id=\"27a8\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\"><strong class=\"mu fs\">Example:<\/strong><\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"b3af\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\">#Log loss evaluation metric<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> log_loss\n\ny_true = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>]\ny_pred = [[<span class=\"hljs-number\">0.89<\/span>, <span class=\"hljs-number\">0.11<\/span>], [<span class=\"hljs-number\">0.3<\/span>, <span class=\"hljs-number\">0.7<\/span>], [<span class=\"hljs-number\">0.81<\/span>, <span class=\"hljs-number\">0.19<\/span>], [<span class=\"hljs-number\">0.6<\/span>, <span class=\"hljs-number\">0.4<\/span>], [<span class=\"hljs-number\">0.1<\/span>, <span class=\"hljs-number\">0.9<\/span>]]\n\nlogloss = log_loss(y_true, y_pred)\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"Log Loss:\"<\/span>, logloss)<\/span><\/pre>\n<h2 id=\"a270\" class=\"nw nx fr be ny nz ph gr ob oc pi gu oe of pj oh oi oj pk ol om on pl op oq or bj\">Cohen&#8217;s Kappa<\/h2>\n<p id=\"5028\" class=\"pw-post-body-paragraph ms mt fr mu b gp os mw mx gs ot mz na nb ou nd ne nf ov nh ni nj ow nl nm nn fk bj\" data-selectable-paragraph=\"\">Cohen&#8217;s Kappa is a statistical measure of inter-rater agreement between two raters for categorical items.<\/p>\n<p id=\"329e\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">In the context of classification models, it measures the agreement between predicted and true labels and considers the possibility of the agreement by chance. It is defined as follows:<\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"eca5\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-attr\">kappa<\/span> = (observed agreement - expected agreement) \/ (<span class=\"hljs-number\">1<\/span> - expected agreement)<\/span><\/pre>\n<p id=\"dd9d\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">Where observed agreement is the proportion of times the raters agreed, and expected agreement is the proportion of times they would be expected to agree by chance.<\/p>\n<p id=\"55ac\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">Cohen&#8217;s Kappa is useful when the classes are imbalanced, and the overall accuracy is not a good indicator of model performance. It is commonly used to evaluate NLP tasks, such as text classification.<\/p>\n<p id=\"39b1\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\"><strong class=\"mu fs\">Example:<\/strong><\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"a9c8\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\">#Cohen's Kappa evaluation metric<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> cohen_kappa_score\n\ny_true = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>]\ny_pred = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>]\n\nkappa = cohen_kappa_score(y_true, y_pred)\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"Cohen's Kappa:\"<\/span>, kappa)<\/span><\/pre>\n<h2 id=\"1444\" class=\"nw nx fr be ny nz ph gr ob oc pi gu oe of pj oh oi oj pk ol om on pl op oq or bj\">Matthew&#8217;s Correlation Coefficient (MCC)<\/h2>\n<p id=\"e69a\" class=\"pw-post-body-paragraph ms mt fr mu b gp os mw mx gs ot mz na nb ou nd ne nf ov nh ni nj ow nl nm nn fk bj\" data-selectable-paragraph=\"\">Matthew&#8217;s correlation coefficient (MCC) is between the observed and predicted binary classifications and considers true and false positives and negatives. It is calculated as follows:<\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"6a88\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-attr\">MCC<\/span> = (TP * TN - FP * FN) \/ sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))<\/span><\/pre>\n<p id=\"1a6e\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">The MCC is applicable when the classes are imbalanced, and the overall accuracy is not a good indicator of model performance. For example, it can be used to evaluate the performance of a cancer diagnosis model where the number of positive samples is much smaller than the number of negative samples.<\/p>\n<p id=\"b186\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\"><strong class=\"mu fs\">Example:<\/strong><\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"ff24\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\">#MCC evaluation metric<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> matthews_corrcoef\n\ny_true = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>]\ny_pred = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>]\n\nmcc = matthews_corrcoef(y_true, y_pred)\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"MCC:\"<\/span>, mcc)<\/span><\/pre>\n<h2 id=\"3956\" class=\"nw nx fr be ny nz ph gr ob oc pi gu oe of pj oh oi oj pk ol om on pl op oq or bj\">Receiver Operating Characteristic (ROC) Curve<\/h2>\n<p id=\"c358\" class=\"pw-post-body-paragraph ms mt fr mu b gp os mw mx gs ot mz na nb ou nd ne nf ov nh ni nj ow nl nm nn fk bj\" data-selectable-paragraph=\"\">The ROC curve plots the true positive rate (TPR) versus the false positive rate (FPR) at different classification thresholds. It provides a way to balance the trade-off between sensitivity (TPR) and specificity (1 \u2014 FPR) for different classification thresholds.<\/p>\n<p id=\"0b7f\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">The AUC (area under the curve) summarizes the model&#8217;s overall performance across all possible classification thresholds.<\/p>\n<p id=\"faf6\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\"><strong class=\"mu fs\">Example:<\/strong><\/p>\n<pre class=\"mb mc md me mf oy oz pa bo pb ba bj\"><span id=\"cb17\" class=\"pc nx fr oz b bf pd pe l pf pg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\">#ROC Curve<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> roc_curve, roc_auc_score\n<span class=\"hljs-keyword\">import<\/span> matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\n\ny_true = [<span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>]\ny_score = [<span class=\"hljs-number\">0.2<\/span>, <span class=\"hljs-number\">0.7<\/span>, <span class=\"hljs-number\">0.9<\/span>, <span class=\"hljs-number\">0.4<\/span>, <span class=\"hljs-number\">0.6<\/span>]\n\nfpr, tpr, thresholds = roc_curve(y_true, y_score)\n\nplt.plot(fpr, tpr)\nplt.xlabel(<span class=\"hljs-string\">'False Positive Rate'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'True Positive Rate'<\/span>)\nplt.title(<span class=\"hljs-string\">'ROC Curve'<\/span>)\nplt.show()\n\nauc = roc_auc_score(y_true, y_score)\n\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"AUC:\"<\/span>, auc)<\/span><\/pre>\n<p id=\"2d78\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">The ROC curve is useful when the classes are imbalanced and the cost of false positives and false negatives is not the same.<\/p>\n<p id=\"fa17\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">For example, it can be used to evaluate the performance of a credit risk model, where the cost of false positives (granting credit to a risky borrower) is higher than the cost of false negatives (rejecting a good borrower).<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"7589\" class=\"pm nx fr be ny pn po pp ob pq pr ps oe nb pt pu pv nf pw px py nj pz qa qb qc bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Conclusion<\/strong><\/h2>\n<p id=\"1639\" class=\"pw-post-body-paragraph ms mt fr mu b gp os mw mx gs ot mz na nb ou nd ne nf ov nh ni nj ow nl nm nn fk bj\" data-selectable-paragraph=\"\">These are some of the additional evaluation metrics for classification models in machine learning. As a data scientist, choosing the right evaluation metrics is essential based on the problem&#8217;s use case and the given dataset&#8217;s characteristics.<\/p>\n<p id=\"6bb1\" class=\"pw-post-body-paragraph ms mt fr mu b gp mv mw mx gs my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn fk bj\" data-selectable-paragraph=\"\">Thanks for reading!<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Jon Tyson on Unsplash In machine learning, data scientists use evaluation metrics to assess the model&#8217;s performance in terms of the ability of the various machine learning models to classify the data points into their respective classes accurately. As a data scientist, selecting the right evaluation metrics is essential based on the problem&#8217;s [&hellip;]<\/p>\n","protected":false},"author":88,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[185],"class_list":["post-8422","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Evaluation Metrics for Classification Models (Part 2)<\/title>\n<meta name=\"description\" content=\"In part 2 of this series, learn about 5 additional evaluation metrics for classification models and example code.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Evaluation Metrics for Classification Models in Machine Learning (Part 2)\" \/>\n<meta property=\"og:description\" content=\"In part 2 of this series, learn about 5 additional evaluation metrics for classification models and example code.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-12T23:03:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:03:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84\" \/>\n<meta name=\"author\" content=\"Pralabh Saxena\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pralabh Saxena\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Evaluation Metrics for Classification Models (Part 2)","description":"In part 2 of this series, learn about 5 additional evaluation metrics for classification models and example code.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2","og_locale":"en_US","og_type":"article","og_title":"Evaluation Metrics for Classification Models in Machine Learning (Part 2)","og_description":"In part 2 of this series, learn about 5 additional evaluation metrics for classification models and example code.","og_url":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-12-12T23:03:52+00:00","article_modified_time":"2025-04-24T17:03:52+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84","type":"","width":"","height":""}],"author":"Pralabh Saxena","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Pralabh Saxena","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2\/"},"author":{"name":"Pralabh Saxena","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/661df331deec9788343ef011c9467cc8"},"headline":"Evaluation Metrics for Classification Models in Machine Learning (Part 2)","datePublished":"2023-12-12T23:03:52+00:00","dateModified":"2025-04-24T17:03:52+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2\/"},"wordCount":706,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2\/","url":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2","name":"Evaluation Metrics for Classification Models (Part 2)","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84","datePublished":"2023-12-12T23:03:52+00:00","dateModified":"2025-04-24T17:03:52+00:00","description":"In part 2 of this series, learn about 5 additional evaluation metrics for classification models and example code.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*yfzWRZc59Aj1cG84"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/evaluation-metrics-for-classification-models-in-machine-learning-part-2#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Evaluation Metrics for Classification Models in Machine Learning (Part 2)"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/661df331deec9788343ef011c9467cc8","name":"Pralabh Saxena","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/af2f89cb395a3afe9b42605f70d9c6a7","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/09\/1689749938719-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/09\/1689749938719-96x96.jpg","caption":"Pralabh Saxena"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/pralabh-saxena2014gmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8422","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/88"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8422"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8422\/revisions"}],"predecessor-version":[{"id":15421,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8422\/revisions\/15421"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8422"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8422"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8422"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8422"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}