{"id":4108,"date":"2022-10-13T14:26:09","date_gmt":"2022-10-13T22:26:09","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4108"},"modified":"2025-04-24T17:17:07","modified_gmt":"2025-04-24T17:17:07","slug":"fraud-detection-imbalanced-classification","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/","title":{"rendered":"Fraud Detection, Imbalanced Classification, and Managing Your Machine Learning Experiments Using Comet"},"content":{"rendered":"\n<p>An end-to-end guide for building and managing an ML-powered fraud detection system<\/p>\n\n\n\n<h2 class=\"wp-block-heading kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\" id=\"0015\"><strong class=\"ba\">A brief history of fraud<\/strong><\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" id=\"cdb8\">The earliest recorded attempt of fraud can be found as far back as the year 300 BC in Greece.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"9238\">A Greek sea merchant named Hegestratos sought to insure his ship and cargo, so took out an insurance policy against them. At the time, the policy was known as a `bottomry` and worked on the basis that a merchant borrowed money to the value of his ship and cargo.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"59c2\">As long as the ship arrived safe and sound at its destination with its cargo intact, then the loan was paid back with interest. If on safe delivery the loan was not repaid, the boat and its cargo were repossessed.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"0d9d\">Hegestratos planned to sink his empty boat, keep the loan, and sell the corn. The plan failed, and he drowned trying to escape his crew and passengers when they caught him in the act.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"241\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_3fRCDEh-qB2g4rM9-300x241.jpg\" alt=\"\" class=\"wp-image-4110\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_3fRCDEh-qB2g4rM9-300x241.jpg 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_3fRCDEh-qB2g4rM9-1024x823.jpg 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_3fRCDEh-qB2g4rM9-768x617.jpg 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_3fRCDEh-qB2g4rM9.jpg 1100w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/3b\/Parigi_griffe.jpg\/1200px-Parigi_griffe.jpg\">Source<\/a><\/p>\n\n\n\n<blockquote class=\"wp-block-quote mw is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"mx my iy bm mz na nb nc nd ne nf ml cn\" id=\"f3be\"><strong>Some 2,000+ years later financial institutions are still fighting the same battle.<\/strong><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading kz la iy bm lb lc ld le lf lg lh li lj ke ng kf ll kh nh ki ln kk ni kl lp lq ga\" id=\"eb1b\"><strong class=\"ba\">Machine learning and fraud<\/strong><\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" id=\"25db\">Traditionally, financial institutions automatically flagged transactions as \u201chigh risk\u201d based on a set of clearly defined rules, and then either denied or manually reviewed flagged transactions.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"3f42\">While these rules-based systems are still an important part of the anti-fraud toolkit, they were never designed for modern internet businesses.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"6527\">This new era of finance and banking has seen tremendous innovation from companies such as PayPal, Stripe, Square, and Venmo which have made it easy for any one to send and receive money right from their phones.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"f532\">This has led to many financial institutions being flooded by a tsunami of data.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"507\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_WnFBGNIGniOt_0gy.png\" alt=\"\" class=\"wp-image-4111\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_WnFBGNIGniOt_0gy.png 633w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_WnFBGNIGniOt_0gy-300x240.png 300w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/cylynx.imgix.net\/uploads\/fraudml-rule-based-vs-ml.png\">Source<\/a><\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"9241\">Machine learning algorithms capable of processing these ever increasingly large datasets have been an indispensable in helping financial institutions identify correlations between user behavior and the likelihood of fraudulent actions.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"0921\">Data scientists have been successful in authenticating transactions using machine learning and predictive analytics.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"29dc\">Automated fraud screening systems powered by machine learning can swiftly and accurately detect a fraudulent transaction, mitigating the risk of that transaction going through and saving financial intuitions and their customers millions of dollars.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong class=\"ba\">Problem statement<\/strong><\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" id=\"1b27\">Imagine yourself as a data scientist at a company that builds products which enable millions of small e-commerce merchants and small, local business owners to conduct business online.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1100\" height=\"1721\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wo8s6rIRV4AsM8yS.jpg\" alt=\"\" class=\"wp-image-4112\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wo8s6rIRV4AsM8yS.jpg 1100w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wo8s6rIRV4AsM8yS-192x300.jpg 192w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wo8s6rIRV4AsM8yS-655x1024.jpg 655w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wo8s6rIRV4AsM8yS-768x1202.jpg 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wo8s6rIRV4AsM8yS-982x1536.jpg 982w\" sizes=\"auto, (max-width: 1100px) 100vw, 1100px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"http:\/\/thumbnails-visually.netdna-ssl.com\/how-online-credit-card-processing-really-works--in-9-steps_503bd543555be_w1500.jpg\">Source<\/a><\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"df16\">Unlike the brick-and-mortar counterparts, online merchants are held liable for fraudulent purchases.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"b1c7\">When a cardholder\u2019s bank declares a transaction as fraudulent, the cardholder is fully reimbursed while the merchant is left footing the cost of the fraud.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"a4ed\">This cost is always more than just the dollar amount of the transaction, because not only do they lose the value of the item sold \u2014 whatever it cost them to make it or purchase it themselves \u2014 they\u2019re also responsible for any fees resulting from the dispute.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"ee2f\">These costs can add up and have a significant negative impact on these merchants livelihoods.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"0516\">Your company wants to help merchants focus on their product and customer experiences \u2014 and not on fraud \u2014 so they\u2019re developing a suite of modern tools for fraud detection and prevention.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"035d\">Central to which is a machine learning model which evaluates transactions for fraud risk and takes action appropriately.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"730f\">Using the dataset provided, your task is to build a machine learning model which provides greatest protection against fraudulent transactions by correctly classifying and blocking fraudulent transactions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\" id=\"a7fb\"><strong class=\"ba\">Data<\/strong><\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" id=\"7697\">Here\u2019s some high level details about the dataset we\u2019re working with:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We have a highly imbalanced dataset, with 99% of transactions being legitimate and less than 1% (n = 8,213) being fraudulent.<\/li>\n\n\n\n<li>All fraudulent transactions come from\u00a0<code class=\"fp on oo op oq b\">CASH_OUT<\/code>\u00a0(n = 4,116) and\u00a0<code class=\"fp on oo op oq b\">TRANSFER<\/code>\u00a0(n = 4,097) type transactions.\u00a0<code class=\"fp on oo op oq b\">TRANSFER<\/code>\u00a0is where money is sent to a customer \/ fraudster and\u00a0<code class=\"fp on oo op oq b\">CASH_OUT<\/code>\u00a0is where money is sent to a merchant who pays the customer \/ fraudster in cash. Speaking to stakeholders we identify that the modus operandi for fraudulent transactions: fraud is committed by first transferring out funds to another account which subsequently cashes it out.<\/li>\n\n\n\n<li>6.3 million transactions coming with 2.7 million unique values for the feature\u00a0<code class=\"fp on oo op oq b\">nameDest<\/code>, which is the feature indicating recipient id of the transaction. Speaking to the stakeholders learn that the naming scheme for both\u00a0<code class=\"fp on oo op oq b\">nameDest<\/code>\u00a0and\u00a0<code class=\"fp on oo op oq b\">nameOrig<\/code>\u00a0columns is as follows: IDs beginning with\u00a0<code class=\"fp on oo op oq b\">C<\/code>\u00a0indicate it is a customer account, and IDs beginning with\u00a0<code class=\"fp on oo op oq b\">M<\/code>\u00a0indicate it is a merchant account.<\/li>\n<\/ul>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"2df5\">And here are the features present in our data set:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"744\" height=\"441\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/1_s19xbE0IeJt3yO_mvtnLNQ.png\" alt=\"\" class=\"wp-image-4113\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/1_s19xbE0IeJt3yO_mvtnLNQ.png 744w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/1_s19xbE0IeJt3yO_mvtnLNQ-300x178.png 300w\" sizes=\"auto, (max-width: 744px) 100vw, 744px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" id=\"5b11\">Note, this data is actually simulated data from the PaySim Kaggle Competition. You can visit the source by following this link:<\/p>\n\n\n\n<div class=\"os ot gt gv ou ov\">\n<div class=\"ow o fr\">\n<div class=\"ox o da dx en oy\">\n<h2 id=\"5462\" class=\"kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\"><strong class=\"ba\">The challenge in building a model<\/strong><\/h2>\n<p id=\"f496\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">In any binary classification problem \u2014 such as the one we\u2019ve got here \u2014 there are two types of misclassifications we can make: False positives and false negatives<\/p>\n<p id=\"23bf\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">And the main challenge here is:&nbsp;<strong class=\"bm pj\">how do we tell if our model is any good?<\/strong><\/p>\n<p id=\"33a1\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Let\u2019s explore this a bit deeper.<\/p>\n<h3 id=\"5e68\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">False negatives<\/strong><\/h3>\n<p id=\"7a51\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">These are instances of fraud which are not identified or prevented before a dispute occurs, when your model says a transaction is not fraudulent but it really is.<\/p>\n<h3 id=\"3420\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">False positives<\/strong><\/h3>\n<p id=\"b209\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">These are legitimate transactions which are blocked by a fraud detection system, when your model says a transaction is fraudulent but it really isn\u2019t.<\/p>\n<h3 id=\"c1e9\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">The tradeoff<\/strong><\/h3>\n<p id=\"2b9f\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">A tradeoff exists between false negatives and false positives.<\/p>\n<p id=\"2506\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Optimize for fewer false negatives, and you must be willing to tolerate a greater occurrence of false positives (and vice versa).<\/p>\n<p id=\"1a22\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A false negative incurs the cost of goods sold and the fee for disputes. A false positive incurs the margin on the goods sold.<\/p>\n<p id=\"ab9f\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Businesses need to decide how to trade off the two.<\/p>\n<h3 id=\"fa0d\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\"><strong class=\"bm pj\">So, which do you choose to optimize for?<\/strong><\/h3>\n<p id=\"22d4\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">When a business has small margins, the false negative is a more costly mistake to make than a false positive. In this case, we would want to be more robust and try to stop as many potentially fraudulent cases as possible.&nbsp;<strong class=\"bm pj\">Even if it means more false positives.<\/strong><\/p>\n<p id=\"1f6c\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">The opposites is true when margins are larger: false positives become more costly than false negatives.<\/p>\n<p id=\"6d38\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">The data scientist faces the problem of building a good machine learning model by engineering the right features and finding the best algorithm for their business, and the business problem of picking a policy for actioning a given machine learning model\u2019s outputs.<\/p>\n<h2 id=\"959c\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">The business policy<\/strong><\/h2>\n<p id=\"744f\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">Suppose that our business prescribes policy to block a payment if the machine learning model assigns the transaction a probability of being fraudulent of at least 0.8 \u2014 in math terms P(fraud) &gt;= 0.8.<\/p>\n<p id=\"dc0f\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">This is because from the businesses point of view we want to be fairly confident before we block a transaction, because it can be a nuisance to merchants if they have too many non-fraudulent transactions flagged as fraudulent. This situation would cause a poor user experience and result in unhappy customers.<\/p>\n<h2 id=\"f1f5\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">Evaluating the machine learning model<\/strong><\/h2>\n<p id=\"7be6\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">There are A LOT of metrics we can choose to evaluate our machine learning model.<\/p>\n<p id=\"f7d8\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A simple confusion matrix yields 14 metrics.<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4114 aligncenter\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wTaKPkp1PuJFCLfp.jpg\" alt=\"\" width=\"802\" height=\"310\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wTaKPkp1PuJFCLfp.jpg 1100w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wTaKPkp1PuJFCLfp-300x116.jpg 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wTaKPkp1PuJFCLfp-1024x397.jpg 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_wTaKPkp1PuJFCLfp-768x297.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p style=\"text-align: center;\" data-selectable-paragraph=\"\"><a href=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/5ccb715016b640627a1c2782\/1586959630891-S859UBX3R47X8OK2J2S0\/generalized-confusion-matrix-wikipedia.jpg\">Source<\/a><\/p>\n<p id=\"ecfa\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Let\u2019s examine three of these in closer detail: precision, recall, and false positive rate.<\/p>\n<h2 id=\"7882\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">Precision<\/strong><\/h2>\n<p id=\"b3ea\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">The precision of our model is the proportion of transactions our model flagged as fraudulent which are truly fraudulent.<\/p>\n<p id=\"2b17\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Example: Suppose we had 100 transactions, and our model flags 60 of these as fraudulent.<\/p>\n<p id=\"5c14\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">And of those 60 transaction that were flagged as fraudulent, 40 are actually fraudulent.<\/p>\n<p id=\"cb1b\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Then the precision would be 40 \/ 60 = 0.66.<\/p>\n<p id=\"504d\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm pj\">A higher precision implies a fewer number of false positives.<\/strong><\/p>\n<h2 id=\"6e5f\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">Recall<\/strong><\/h2>\n<p id=\"e575\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">The recall of our model (aka true positive rate) is the proportion of all truly fraudulent transactions which were flagged as fraudulent by our model.<\/p>\n<p id=\"a8bb\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Example: Suppose we had 100 transactions, and 50 of these are truly fraudulent.<\/p>\n<p id=\"d78f\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">And of these 50 truly fraudulent transactions, our model flags 40 of them as fraudulent.<\/p>\n<p id=\"7d35\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Then the recall is 40 \/ 50 = 0.8.<\/p>\n<p id=\"6584\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm pj\">A higher implies recall implies a fewer number of false negatives.<\/strong><\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4115 aligncenter\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_pz_ANbR1mV-Qt31g.png\" alt=\"\" width=\"401\" height=\"729\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_pz_ANbR1mV-Qt31g.png 1100w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_pz_ANbR1mV-Qt31g-165x300.png 165w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_pz_ANbR1mV-Qt31g-563x1024.png 563w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_pz_ANbR1mV-Qt31g-768x1396.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_pz_ANbR1mV-Qt31g-845x1536.png 845w\" sizes=\"auto, (max-width: 401px) 100vw, 401px\" \/><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p style=\"text-align: center;\" data-selectable-paragraph=\"\"><a href=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/26\/Precisionrecall.svg\/1200px-Precisionrecall.svg.png\">Source<\/a><\/p>\n<h2 id=\"8257\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">False positive rate<\/strong><\/h2>\n<p id=\"a6af\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">The false positive rate of our model is the proportion of all legit (non-fraudulent) transactions which the model incorrectly flagged as fraudulent.<\/p>\n<p id=\"b5c8\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Example: Suppose we had 100 transactions, and 50 of these are transactions are legit.<\/p>\n<p id=\"1649\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">And of these 50 legit transactions, our model flagged 20 of them as fraudulent.<\/p>\n<p id=\"d76e\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Then the false positive rate is 20 \/ 50 = 0.4.<\/p>\n<h3 id=\"bfe8\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">So you\u2019re probably wondering\u2026What are \u201cgood values\u201d for precision, recall, and false positive rate.<\/strong><\/h3>\n<p id=\"44d0\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">If we had a perfectly clairvoyant model, then 100% of of the transactions it classifies as fraud would actually be fraud.<\/p>\n<p id=\"6196\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">This would imply a few things about the values of precision, recall, and false positive rate:<\/p>\n<ul class=\"\">\n<li id=\"5b70\" class=\"nz oa iy bm b lt mm lw mn lz ob md oc mh od ml oe of og oh ga\" data-selectable-paragraph=\"\">Precision would equal 1 (100% of transactions that the model classifies as fraud are actually fraud).<\/li>\n<li id=\"a34e\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">Recall would equal 1 (100% of actual fraud cases are identified).<\/li>\n<li id=\"9aec\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">False positive rate would equal 0 (no legitimate transactions were incorrectly classified as fraudulent).<\/li>\n<\/ul>\n<p id=\"6775\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">We don\u2019t often build clairvoyant models, so there is a tradeoff between precision and recall.<\/p>\n<h2 id=\"8542\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">The impact of the probability threshold<\/strong><\/h2>\n<p id=\"d736\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">As we&nbsp;<strong class=\"bm pj\">increase the probability threshold<\/strong>&nbsp;for blocking (the business policy we discussed earlier),&nbsp;<strong class=\"bm pj\">precision will increase<\/strong>.<\/p>\n<p id=\"327b\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">That\u2019s because the criterion for blocking a transaction is more strict, which implies a high level of \u201cconfidence\u201d for classifying the transaction as fraudulent. This, in turn, results in fewer false positives.<\/p>\n<p id=\"a69c\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">By the same logic, it follows that increasing the probability threshold will cause recall to decrease, implying fewer false negatives. That\u2019s because we now have fewer transactions match the high probability threshold.<\/p>\n<p id=\"e427\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Conversely, when we decrease the probability threshold the reverse happens: precision will decrease and recall will increase.<\/p>\n<h2 id=\"c00f\" class=\"kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\"><strong class=\"ba\">ROC and precision-recall curves<\/strong><\/h2>\n<h3 id=\"b2c0\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">ROC Curves<\/strong><\/h3>\n<p id=\"9474\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">One way to understand and visualize this trade-off is through the ROC-curve, which plots recall (aka true positive rate) against the false positive rate<\/p>\n<p id=\"a6c5\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A perfect model will have an ROC curve that hug the top left of the graph \u2014 where recall is 1.0 and the false positive rate is 0.0.<\/p>\n<p id=\"b44c\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A random model would have the true positive rate equal the false positive rate, and we\u2019d end up with a straight line.<\/p>\n<p id=\"cbcd\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">We hope to build a model that\u2019s somewhere between a perfect model and a random guesser.<\/p>\n<p id=\"6891\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">One way to capture the overall quality of the model is by computing the area under the curve ROC curve (ROC AUC).<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4116 aligncenter\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_yPzIMX1aly1BEedL.png\" alt=\"\" width=\"647\" height=\"485\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_yPzIMX1aly1BEedL.png 864w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_yPzIMX1aly1BEedL-300x225.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_yPzIMX1aly1BEedL-768x576.png 768w\" sizes=\"auto, (max-width: 647px) 100vw, 647px\" \/><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p style=\"text-align: center;\" data-selectable-paragraph=\"\"><a href=\"https:\/\/miro.medium.com\/max\/864\/1*PU3_4LheadpGcpl6daO1mA.png\">Source<\/a><\/p>\n<h2 id=\"395b\" class=\"kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\"><strong class=\"ba\">Class imbalance changes everything<\/strong><\/h2>\n<p id=\"5571\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">When the classification problem becomes an imbalanced classification problem \u2014 the ROC AUC becomes misleading.<\/p>\n<p id=\"1aa2\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">This imbalanced situations a small number of correct or incorrect predictions can result in a large change in the ROC Curve or ROC AUC score.<\/p>\n<p id=\"c9d1\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A common alternative is the precision-recall curve and area under curve.<\/p>\n<h2 id=\"13b7\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">Precision-recall curves<\/strong><\/h2>\n<p id=\"0b4f\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">A precision-recall curve (or PR Curve) is a plot of the precision (y-axis) and the recall (x-axis) for different probability thresholds.<\/p>\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-4117 aligncenter\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_XtneREetBK7J0ITp.png\" alt=\"\" width=\"512\" height=\"420\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_XtneREetBK7J0ITp.png 512w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_XtneREetBK7J0ITp-300x246.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p style=\"text-align: center;\" data-selectable-paragraph=\"\"><a href=\"https:\/\/miro.medium.com\/max\/1024\/1*KZu3UEBx3UIgOvdS6V_h_A.png\">Source<\/a><\/p>\n<p id=\"36c1\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A perfectly skilled, clairvoyant model would be depicted as a point at a coordinate of (1,1).<\/p>\n<p id=\"9543\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">A random, no-skill classifier will be a horizontal line on the plot with a precision that is proportional to the number of fraudulent transactions in the dataset.<\/p>\n<p id=\"cbe2\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">As a model becomes better and better the it will start to hug the top-right of the graph \u2014 where precision and recall are both maximized at 1.0.<\/p>\n<p id=\"909c\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Because the Precision-Recall curve puts more emphasis on the minority class, it proves to be an effective diagnostic for imbalanced binary classification models.<\/p>\n<p id=\"7b47\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Just like with the ROC Curve, we can capture the overall quality of the model by computing the area under the precision-recall curve (PR AUC).<\/p>\n<blockquote class=\"pn po pp\"><p id=\"1491\" class=\"lr ls pq bm b lt mm jz lv lw mn kc ly pr mo mb mc ps mp mf mg pt mq mj mk ml ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm pj\">In our example, we\u2019ll use the area under the Precision-Recall curve as an evaluation metric.<\/strong><\/p><\/blockquote>\n<h2 id=\"96a3\" class=\"kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\"><strong class=\"ba\">Methodology<\/strong><\/h2>\n<p id=\"1419\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">Using the data provided by our stakeholders, we\u2019ll perform some basic data understanding, exploratory data analysis, engineer features, balance our dataset, spot check candidate models, choose the best candidates for further hyper-parameter tuning, and finally evaluate models to identify which one we\u2019ll add to our model registry.<\/p>\n<p id=\"2f78\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Each of the notebooks linked below has all the details for each step in the methodology.<\/p>\n<p id=\"401a\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">We\u2019ll be tracking our experiments using Comet throughout!<\/p>\n<h2 id=\"4075\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\" data-selectable-paragraph=\"\"><strong class=\"ba\">Dependencies<\/strong><\/h2>\n<p id=\"63be\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">We\u2019ll use the following libraries in python to assist us with this project:<\/p>\n<ul class=\"\">\n<li id=\"ae1f\" class=\"nz oa iy bm b lt mm lw mn lz ob md oc mh od ml oe of og oh ga\" data-selectable-paragraph=\"\">pandas<\/li>\n<li id=\"9233\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">numpy<\/li>\n<li id=\"dbfe\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">scikit-learn<\/li>\n<li id=\"c681\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">imblearn<\/li>\n<li id=\"ef44\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">Comet ML<\/li>\n<li id=\"873e\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">pycaret<\/li>\n<li id=\"1aad\" class=\"nz oa iy bm b lt oi lw oj lz ok md ol mh om ml oe of og oh ga\" data-selectable-paragraph=\"\">sweetviz<\/li>\n<\/ul>\n<h2 id=\"3d46\" class=\"kz la iy bm lb lc ld le lf lg lh li lj ke lk kf ll kh lm ki ln kk lo kl lp lq ga\"><strong class=\"ba\">Notebooks<\/strong><\/h2>\n<p id=\"47af\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">I firmly believe the best way to learn something is by trying it for yourself. Instead of just sharing code snippets, I thought it would be more fun for you to get hands on, so I\u2019ve shared runnable Colab Notebooks so you can more easily follow along.<\/p>\n<h3 id=\"dcf0\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">Fetch raw data<\/strong><\/h3>\n<figure><a href=\"https:\/\/colab.research.google.com\/drive\/1pZwUQpx0yJiCDKVsU6ZTyZHPStQ8rVG3?usp=sharing\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4118 alignnone\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-1.png\" alt=\"\" width=\"732\" height=\"179\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-1.png 1044w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-1-300x73.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-1-1024x250.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-1-768x188.png 768w\" sizes=\"auto, (max-width: 732px) 100vw, 732px\" \/><\/a><\/figure><p><\/p>\n<p id=\"50f8\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">In this notebook we access the data provided to us by our stakeholders and log the raw data to&nbsp;<code class=\"fp on oo op oq b\">Comet<\/code>&nbsp;as an Artifact.<\/p>\n<h3 id=\"0631\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">Data understanding<\/strong><\/h3>\n<figure><a href=\"https:\/\/colab.research.google.com\/drive\/1syf2WFFs6op-f7gvqvWfxlgLAmDKn0pe?usp=sharing\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4119\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-2.png\" alt=\"\" width=\"729\" height=\"179\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-2.png 1043w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-2-300x74.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-2-1024x251.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-2-768x189.png 768w\" sizes=\"auto, (max-width: 729px) 100vw, 729px\" \/><\/a><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p id=\"7625\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Profiling data using the&nbsp;<code class=\"fp on oo op oq b\">sweetviz<\/code>&nbsp;library to get a feel of it and see what\u2019s going on in there. If you haven\u2019t seen&nbsp;<code class=\"fp on oo op oq b\">sweetviz<\/code>&nbsp;in action, prepared to be impressed! It\u2019s an an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code.<\/p>\n<p id=\"74eb\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">Learn more about how&nbsp;<code class=\"fp on oo op oq b\">sweetviz<\/code> integrated with Comet here:<\/p>\n<figure><a href=\"https:\/\/towardsdatascience.com\/automatically-track-all-your-eda-using-sweetviz-and-comet-ml-9cb7545b0fab\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4120\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-3.png\" alt=\"\" width=\"723\" height=\"178\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-3.png 1044w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-3-300x74.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-3-1024x252.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-3-768x189.png 768w\" sizes=\"auto, (max-width: 723px) 100vw, 723px\" \/><\/a><\/figure><p><\/p>\n<h3 id=\"33a7\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">Exploratory data analysis and feature engineering<\/strong><\/h3>\n<div class=\"os ot gt gv ou ov\">\n<div class=\"ow o fr\">\n<figure><a href=\"https:\/\/colab.research.google.com\/drive\/13RPJIljeYsRsHi_oBPzoDfZZXjO3S2xA?usp=sharing\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4121\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-4.png\" alt=\"\" width=\"728\" height=\"177\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-4.png 1044w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-4-300x73.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-4-1024x249.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-4-768x187.png 768w\" sizes=\"auto, (max-width: 728px) 100vw, 728px\" \/><\/a><\/figure><div class=\"ox o da dx en oy\"><\/div>\n<div>\n<p id=\"533e\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">More in-depth analysis of the data and creating new features from raw data. We\u2019ll also log versions of our data as an Artifact to&nbsp;<code class=\"fp on oo op oq b\">Comet<\/code>&nbsp;to keep track of it.<\/p>\n<h3 id=\"f2f5\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">Spot-checking algorithms<\/strong><\/h3>\n<div class=\"os ot gt gv ou ov\">\n<div class=\"ow o fr\">\n<figure><a href=\"https:\/\/colab.research.google.com\/drive\/1AABRpUsqRyJh7X7scQgUYOZ0GuiAuP8d?usp=sharing\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4122\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-5.png\" alt=\"\" width=\"726\" height=\"180\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-5.png 1044w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-5-300x74.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-5-1024x254.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-5-768x191.png 768w\" sizes=\"auto, (max-width: 726px) 100vw, 726px\" \/><\/a><\/figure><div class=\"ox o da dx en oy\"><\/div>\n<div>\n<p id=\"696e\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">We\u2019ll spot-check a suite of classification algorithms and choose the best one for further hyper-parameter tuning.<\/p>\n<h3 id=\"43f5\" class=\"nk la iy bm lb nl nm nn lf no np nq lj lz nr ns ll md nt nu ln mh nv nw lp nx ga\"><strong class=\"ba\">Experimenting with hyper-parameters<\/strong><\/h3>\n<figure><a href=\"https:\/\/colab.research.google.com\/drive\/17wSrt1KB44jXtgH0cVqTEiJGLDB_F99S?usp=sharing\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4123\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-6.png\" alt=\"\" width=\"721\" height=\"176\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-6.png 1044w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-6-300x73.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-6-1024x250.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/colab-6-768x188.png 768w\" sizes=\"auto, (max-width: 721px) 100vw, 721px\" \/><\/a><\/figure><p><\/p>\n<div class=\"ir is it iu iv\">\n<p id=\"d1dd\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">We\u2019ll use Comet to run experiments and select the best combination of hyper-parameters for these algorithms.<\/p>\n<p id=\"6199\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">We\u2019ll then evaluate the performance on the test set and register it on the&nbsp;<a class=\"au ky\" href=\"https:\/\/www.comet.com\/site\/using-comet-model-registry\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Model Registry in Comet<\/a>. This is an incredibly useful tool which allows us to store trained machine learning models, metadata about the data and training jobs used to create the model, and keep track of which versions are in production. This is a critical piece of the puzzle for&nbsp;<strong class=\"bm pj\">establishing lineage<\/strong>&nbsp;for ML models.<\/p>\n<\/div>\n<div class=\"o dx qa qb id qc\" role=\"separator\">\n<div class=\"ir is it iu iv\">\n<h2 id=\"c70d\" class=\"kz la iy bm lb lc qh le lf lg qi li lj ke qj kf ll kh qk ki ln kk ql kl lp lq ga\">Conclusion<\/h2>\n<p id=\"bcc0\" class=\"pw-post-body-paragraph lr ls iy bm b lt lu jz lv lw lx kc ly lz ma mb mc md me mf mg mh mi mj mk ml ir ga\" data-selectable-paragraph=\"\">That\u2019s it! I hope you\u2019re ready to explore the notebooks, run them for yourself, and see what the end-to-end process we\u2019ve outlined in this post looks like.<\/p>\n<p id=\"29bf\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">I\u2019ve also included some homework for you within the notebooks! Machine learning is an art as much as a science, and the art comes from the various ways you can experiment with the process.<\/p>\n<p id=\"885a\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">I\u2019ve included some suggestions in each of the notebooks for things that you can try on your own, and my hope is that you come up with some interesting ideas and share them with me in our&nbsp;<a class=\"au ky\" href=\"http:\/\/bit.ly\/comet-community\" target=\"_blank\" rel=\"noopener ugc nofollow\">open Slack community<\/a>.<\/p>\n<p id=\"debe\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">That\u2019s it for this write up, I\u2019ll see you in the notebooks.<\/p>\n<p id=\"e0be\" class=\"pw-post-body-paragraph lr ls iy bm b lt mm jz lv lw mn kc ly lz mo mb mc md mp mf mg mh mq mj mk ml ir ga\" data-selectable-paragraph=\"\">And remember my friends: You\u2019ve got one life on this planet, why not try to do something big?<\/p>\n<\/div>\n<\/div>\n<div class=\"os ot gt gv ou ov\">\n<div class=\"ow o fr\">\n<div class=\"ox o da dx en oy\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"ms mt do mu ce mv\" tabindex=\"0\" role=\"button\"><\/div>\n<\/figure>\n","protected":false},"excerpt":{"rendered":"<p>An end-to-end guide for building and managing an ML-powered fraud detection system A brief history of fraud The earliest recorded attempt of fraud can be found as far back as the year 300 BC in Greece. A Greek sea merchant named Hegestratos sought to insure his ship and cargo, so took out an insurance policy [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":4124,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[8,6,9],"tags":[],"coauthors":[135],"class_list":["post-4108","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-machine-learning","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Fraud Detection and Imbalanced Classification - Comet<\/title>\n<meta name=\"description\" content=\"Learn about fraud detection, imbalanced classification, and managing your machine learning experiments with Comet.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Fraud Detection, Imbalanced Classification, and Managing Your Machine Learning Experiments Using Comet\" \/>\n<meta property=\"og:description\" content=\"Learn about fraud detection, imbalanced classification, and managing your machine learning experiments with Comet.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-10-13T22:26:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:17:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_HvGD2kFnzOsvYNr4.png\" \/>\n\t<meta property=\"og:image:width\" content=\"682\" \/>\n\t<meta property=\"og:image:height\" content=\"325\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Fraud Detection and Imbalanced Classification - Comet","description":"Learn about fraud detection, imbalanced classification, and managing your machine learning experiments with Comet.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/","og_locale":"en_US","og_type":"article","og_title":"Fraud Detection, Imbalanced Classification, and Managing Your Machine Learning Experiments Using Comet","og_description":"Learn about fraud detection, imbalanced classification, and managing your machine learning experiments with Comet.","og_url":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-10-13T22:26:09+00:00","article_modified_time":"2025-04-24T17:17:07+00:00","og_image":[{"width":682,"height":325,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_HvGD2kFnzOsvYNr4.png","type":"image\/png"}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"Fraud Detection, Imbalanced Classification, and Managing Your Machine Learning Experiments Using Comet","datePublished":"2022-10-13T22:26:09+00:00","dateModified":"2025-04-24T17:17:07+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/"},"wordCount":2429,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_HvGD2kFnzOsvYNr4.png","articleSection":["Comet Community Hub","Machine Learning","Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/","url":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/","name":"Fraud Detection and Imbalanced Classification - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_HvGD2kFnzOsvYNr4.png","datePublished":"2022-10-13T22:26:09+00:00","dateModified":"2025-04-24T17:17:07+00:00","description":"Learn about fraud detection, imbalanced classification, and managing your machine learning experiments with Comet.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_HvGD2kFnzOsvYNr4.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/10\/0_HvGD2kFnzOsvYNr4.png","width":682,"height":325,"caption":"graphic of hooded person holding up their hand next to a dollar sign"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/fraud-detection-imbalanced-classification\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Fraud Detection, Imbalanced Classification, and Managing Your Machine Learning Experiments Using Comet"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4108","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4108"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4108\/revisions"}],"predecessor-version":[{"id":15674,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4108\/revisions\/15674"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/4124"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4108"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4108"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4108"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4108"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}