{"id":4591,"date":"2022-11-10T17:48:04","date_gmt":"2022-11-11T01:48:04","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4591"},"modified":"2025-04-24T17:16:38","modified_gmt":"2025-04-24T17:16:38","slug":"model-interpretability-part-2-global-model-agnostic-methods","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/","title":{"rendered":"Model Interpretability Part 2: Global Model Agnostic Methods"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\">Photo by <a class=\"au kj\" href=\"https:\/\/unsplash.com\/@nasa?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">NASA<\/a>&nbsp;on&nbsp;<a class=\"au kj\" href=\"https:\/\/unsplash.com\/s\/photos\/globe?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/p>\n\n\n\n<div class=\"ir is it iu iv\">\n<p id=\"133e\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">As mentioned in&nbsp;<a class=\"au kj\" href=\"https:\/\/heartbeat.comet.ml\/model-interpretability-part-1-the-importance-and-approaches-f93239edcd21\" target=\"_blank\" rel=\"noopener ugc nofollow\">Part 1 of Model Interpretability<\/a>, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular. Data Scientists and Machine Learning Engineers can use any machine learning model they wish as the interpretation method can be applied to it. This allows for the evaluation of the task, and the comparison of the model interpretability much simpler.<\/p>\n<p id=\"fdf6\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">Part 2 of this series about Model Interpretability is about Global Model Agnostic Methods. To recap:<\/p>\n<p id=\"1633\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Global Interpretability<\/strong>&nbsp;aims to capture the entire model. It focuses on the explanation and understanding of why the model makes particular decisions, based on the dependent and independent variables.<\/p>\n<h1 id=\"fbad\" class=\"li lj iy bm lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf ga\" data-selectable-paragraph=\"\">Global Methods<\/h1>\n<p id=\"ce35\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">Global methods are used to describe the average behavior of a machine learning model, making them of great value when the engineer of the model wants a better understanding of the general concepts of the model, its data, and how to possibly debug it.<\/p>\n<p id=\"2eec\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">I will be going through three different types of Global Model Agnostic Methods.<\/p>\n<h1 id=\"d9d7\" class=\"li lj iy bm lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf ga\" data-selectable-paragraph=\"\">Partial Dependence Plot (PDP)<\/h1>\n<p id=\"d396\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">The<strong class=\"bm lh\">&nbsp;Partial Dependence Plot<\/strong>&nbsp;shows the functional relationship between the set of input features and how it affects the prediction\/target response. It explores how the predictions are more dependent on specific values of the input variable of interest over others.<\/p>\n<p id=\"64fe\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">It can show if the relationship between the target response and a feature is either linear, monotonic, or more complex. It helps researchers and data scientists\/engineers understand and determine what happens to model predictions as various features are adjusted.<\/p>\n<p id=\"f620\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">According to Greenwell et al\u2019s paper:&nbsp;<a class=\"au kj\" href=\"https:\/\/arxiv.org\/pdf\/1805.04755.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">A Simple and Effective Model-Based Variable Importance Measure<\/a>, a flat partial dependence plot indicates that the feature is not important and has no effect on the target response. The more the Partial Dependence Plot varies, the more the feature is important to its prediction.<\/p>\n<p id=\"675f\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">When using numerical features, the importance of these features can be defined as the deviation of each unique feature value from the average curve, using this formula:<\/p>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*ivp1xJDfGDLq5s93\" alt=\"\" width=\"700\" height=\"159\"><\/figure><div class=\"gl gm ml\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*ivp1xJDfGDLq5s93 640w, https:\/\/miro.medium.com\/max\/720\/0*ivp1xJDfGDLq5s93 720w, https:\/\/miro.medium.com\/max\/750\/0*ivp1xJDfGDLq5s93 750w, https:\/\/miro.medium.com\/max\/786\/0*ivp1xJDfGDLq5s93 786w, https:\/\/miro.medium.com\/max\/828\/0*ivp1xJDfGDLq5s93 828w, https:\/\/miro.medium.com\/max\/1100\/0*ivp1xJDfGDLq5s93 1100w, https:\/\/miro.medium.com\/max\/1400\/0*ivp1xJDfGDLq5s93 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<div><\/div>\n<\/div>\n<\/figure>\n<h2 id=\"3c70\" class=\"mq lj iy bm lk mr ms mt lo mu mv mw ls ku mx my lw ky mz na ma lc nb nc me nd ga\" data-selectable-paragraph=\"\">An example:<\/h2>\n<p id=\"0b87\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">Let\u2019s say we are using the&nbsp;<a class=\"au kj\" href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Cervical+cancer+%28Risk+Factors%29\" target=\"_blank\" rel=\"noopener ugc nofollow\">cervical cancer dataset<\/a>&nbsp;which explores and indicates the risk factors of whether a woman will get cervical cancer.<\/p>\n<p id=\"f218\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">In this example, we fit a random forest to predict whether a woman might get cervical cancer based on risk factors such as the number of pregnancies, use of hormonal contraceptives, and more. We use a Partial Dependence Plot to compute and visualize the probability of getting cancer based on the different features.<\/p>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*x872LHmbPZrK6aZU\" alt=\"\" width=\"700\" height=\"400\"><\/figure><div class=\"gl gm ne\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*x872LHmbPZrK6aZU 640w, https:\/\/miro.medium.com\/max\/720\/0*x872LHmbPZrK6aZU 720w, https:\/\/miro.medium.com\/max\/750\/0*x872LHmbPZrK6aZU 750w, https:\/\/miro.medium.com\/max\/786\/0*x872LHmbPZrK6aZU 786w, https:\/\/miro.medium.com\/max\/828\/0*x872LHmbPZrK6aZU 828w, https:\/\/miro.medium.com\/max\/1100\/0*x872LHmbPZrK6aZU 1100w, https:\/\/miro.medium.com\/max\/1400\/0*x872LHmbPZrK6aZU 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Source: <\/picture><a class=\"au kj\" href=\"https:\/\/christophm.github.io\/interpretable-ml-book\/interaction.html#examples-2\" target=\"_blank\" rel=\"noopener ugc nofollow\">christophm<\/a><\/div>\n<\/div>\n<\/figure>\n<p id=\"6baf\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">Above are two visualizations that show the Partial Dependence Plots of cancer probability based on the features: age and years of hormonal contraceptive use.<\/p>\n<p id=\"f0e6\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">For the age feature, we can see that the PDP remains low until the age of 40 is reached, then the probability of cancer increases. This is the same for the contraceptive feature, after 10 years of using hormonal contraceptives, there is an increase in the probability of cancer.<\/p>\n<p id=\"15f3\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Advantages:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"2c76\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">Partial Dependence Plots are&nbsp;<strong class=\"bm lh\">easy to implement and interpret<\/strong>. Changing the features and measuring the impact it has on the prediction is a simple form of analyzing the relationship between the feature and prediction as well as interpreting complex models or tasks.<\/li>\n<li id=\"caa2\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Interpretations are clear.<\/strong>&nbsp;There are some models where you will have to dive into understanding the explanation, however with PDP, if the feature used to compute the PDP is not correlated with other features it simply shows that the feature has little or no effect on the prediction. With this, you can make simple and clear interpretations.<\/li>\n<\/ul>\n<p id=\"34cd\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Disadvantages:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"3bfc\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">The maximum number of features is 2<\/strong>. This is due to the 2-D representation that PDP is limited to. Using PDP to plot and interpret more than two features is difficult.<\/li>\n<li id=\"1f76\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Lack of Data.<\/strong>&nbsp;This is an issue for a lot of processes, methods, and models, however, PDP may not be accurate for values that have little data. Interpreting regions with almost no data can be very misleading.<\/li>\n<li id=\"4c21\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">The assumption of Independence.&nbsp;<\/strong>Some features are not primarily independent and other features influence them. For example, imagine you are predicting the time it takes for someone to run 100m, taking into consideration their height and weight. The PDP of one feature, height, does not correlate with the other feature, weight. This is not true, and both these features directly affect the time it takes for someone to run 100m. PDP is easily interpreted if it is assumed that the feature or features for the computed partial dependence are not correlated with any other feature, however, this is also its biggest advantage.<\/li>\n<\/ul>\n<h1 id=\"0d98\" class=\"li lj iy bm lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf ga\" data-selectable-paragraph=\"\">Implementing PDP in your projects<\/h1>\n<ul class=\"\">\n<li id=\"85ba\" class=\"nf ng iy bm b km mg kq mh ku nt ky nu lc nv lg nk nl nm nn ga\" data-selectable-paragraph=\"\">If you are using R, there are packages such as: iml, pdp, and DALEX.<\/li>\n<li id=\"3a50\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">If you are using Python, there are packages such as the PDPBox and the PartialDependenceDisplay function in the sklearn.inspection module. For more information on sklearn.inspection, refer to this&nbsp;<a class=\"au kj\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/partial_dependence.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">link<\/a>.<\/li>\n<\/ul>\n<h1 id=\"f217\" class=\"li lj iy bm lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf ga\" data-selectable-paragraph=\"\">2. Feature Interaction<\/h1>\n<p id=\"1d17\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">So what is the solution to the disadvantage of PDP and its belief that features are not influenced by another feature?&nbsp;<strong class=\"bm lh\">Feature Interaction.<\/strong>&nbsp;One feature and its effect are dependent on the value of other features.<\/p>\n<p id=\"9f00\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">When two features interact with one another, the change in the prediction occurs due to the variations in the feature and how it affects the individual features.<\/p>\n<p id=\"e244\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">To better understand this concept, we can break down the predictions into four terms based on a machine learning model making a prediction based on two features:<\/p>\n<ol class=\"\">\n<li id=\"4f05\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Constant term<\/li>\n<li id=\"8b49\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Term for the first feature<\/li>\n<li id=\"5872\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Term for the second feature<\/li>\n<li id=\"74cc\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Term for the interaction between the two features<\/li>\n<\/ol>\n<\/div>\n\n\n\n<div class=\"o dx nx ny id nz\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<blockquote class=\"oe\"><p id=\"6dfb\" class=\"of og iy bm oh oi oj ok ol om on lg cn\" data-selectable-paragraph=\"\">The most important thing to keep in mind when building and deploying your model? Understanding your end-goal.&nbsp;<a class=\"au kj\" href=\"https:\/\/www.comet.com\/site\/industry-qa-where-most-machine-learning-projects-fail\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Read our interview with ML experts from Stanford, Google, and HuggingFace to learn more.<\/a><\/p><\/blockquote>\n<\/div>\n\n\n\n<div class=\"o dx nx ny id nz\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h2 id=\"6f71\" class=\"mq lj iy bm lk mr ms mt lo mu mv mw ls ku mx my lw ky mz na ma lc nb nc me nd ga\" data-selectable-paragraph=\"\">Friedman\u2019s H-statistic<\/h2>\n<p id=\"6c8a\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">If two features do not interact with one another, we can assume that the partial dependence function is centered at 0. We can state the formula as:<\/p>\n<ul class=\"\">\n<li id=\"6165\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">PDjk(xj, xk) is the 2-way partial dependence function of both features<\/li>\n<li id=\"234d\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">PDj(xj) + PDk(xk) are the two partial dependence functions of the single features<\/li>\n<\/ul>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/604\/0*Tec6DGQhv63CAXp2\" alt=\"\" width=\"604\" height=\"114\"><\/figure><div class=\"gl gm oo\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*Tec6DGQhv63CAXp2 640w, https:\/\/miro.medium.com\/max\/720\/0*Tec6DGQhv63CAXp2 720w, https:\/\/miro.medium.com\/max\/750\/0*Tec6DGQhv63CAXp2 750w, https:\/\/miro.medium.com\/max\/786\/0*Tec6DGQhv63CAXp2 786w, https:\/\/miro.medium.com\/max\/828\/0*Tec6DGQhv63CAXp2 828w, https:\/\/miro.medium.com\/max\/1100\/0*Tec6DGQhv63CAXp2 1100w, https:\/\/miro.medium.com\/max\/1208\/0*Tec6DGQhv63CAXp2 1208w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 604px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"f0f4\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">However, if the feature does not have any interaction with one another or with any other feature, the prediction function can be stated as:<\/p>\n<ul class=\"\">\n<li id=\"7938\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">f^(x) is the sum of partial dependence functions<\/li>\n<li id=\"52e0\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">PDj(xj) is the partial dependence that depends on the feature j<\/li>\n<li id=\"cf50\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">PD\u2212j(x\u2212j) is the partial dependence that depends on all other features except the j-th feature.<\/li>\n<\/ul>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/532\/0*jMEYdIonDAH2x-u7\" alt=\"\" width=\"532\" height=\"106\"><\/figure><div class=\"gl gm op\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*jMEYdIonDAH2x-u7 640w, https:\/\/miro.medium.com\/max\/720\/0*jMEYdIonDAH2x-u7 720w, https:\/\/miro.medium.com\/max\/750\/0*jMEYdIonDAH2x-u7 750w, https:\/\/miro.medium.com\/max\/786\/0*jMEYdIonDAH2x-u7 786w, https:\/\/miro.medium.com\/max\/828\/0*jMEYdIonDAH2x-u7 828w, https:\/\/miro.medium.com\/max\/1100\/0*jMEYdIonDAH2x-u7 1100w, https:\/\/miro.medium.com\/max\/1064\/0*jMEYdIonDAH2x-u7 1064w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 532px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"bf95\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">The next step involves measuring the interactions between the features:<\/p>\n<ul class=\"\">\n<li id=\"5b84\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">The interaction between feature j and k:<\/li>\n<\/ul>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*isRhbcGf4BjoAnxv\" alt=\"\" width=\"700\" height=\"146\"><\/figure><div class=\"gl gm oq\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*isRhbcGf4BjoAnxv 640w, https:\/\/miro.medium.com\/max\/720\/0*isRhbcGf4BjoAnxv 720w, https:\/\/miro.medium.com\/max\/750\/0*isRhbcGf4BjoAnxv 750w, https:\/\/miro.medium.com\/max\/786\/0*isRhbcGf4BjoAnxv 786w, https:\/\/miro.medium.com\/max\/828\/0*isRhbcGf4BjoAnxv 828w, https:\/\/miro.medium.com\/max\/1100\/0*isRhbcGf4BjoAnxv 1100w, https:\/\/miro.medium.com\/max\/1400\/0*isRhbcGf4BjoAnxv 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<ul class=\"\">\n<li id=\"8a11\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">The interaction between feature j and any other features:<\/li>\n<\/ul>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*2pRNflhj95kyGXl9\" alt=\"\" width=\"700\" height=\"189\"><\/figure><div class=\"gl gm or\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*2pRNflhj95kyGXl9 640w, https:\/\/miro.medium.com\/max\/720\/0*2pRNflhj95kyGXl9 720w, https:\/\/miro.medium.com\/max\/750\/0*2pRNflhj95kyGXl9 750w, https:\/\/miro.medium.com\/max\/786\/0*2pRNflhj95kyGXl9 786w, https:\/\/miro.medium.com\/max\/828\/0*2pRNflhj95kyGXl9 828w, https:\/\/miro.medium.com\/max\/1100\/0*2pRNflhj95kyGXl9 1100w, https:\/\/miro.medium.com\/max\/1400\/0*2pRNflhj95kyGXl9 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h2 id=\"ed83\" class=\"mq lj iy bm lk mr ms mt lo mu mv mw ls ku mx my lw ky mz na ma lc nb nc me nd ga\" data-selectable-paragraph=\"\">An example:<\/h2>\n<p id=\"fe2c\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">Now let\u2019s use the same&nbsp;<a class=\"au kj\" href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Cervical+cancer+%28Risk+Factors%29\" target=\"_blank\" rel=\"noopener ugc nofollow\">cervical cancer dataset<\/a>&nbsp;and apply Friedman\u2019s H-statistic on each feature.<\/p>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*1rZL7Kt-6dXwgWWU\" alt=\"\" width=\"700\" height=\"512\"><\/figure><div class=\"gl gm os\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*1rZL7Kt-6dXwgWWU 640w, https:\/\/miro.medium.com\/max\/720\/0*1rZL7Kt-6dXwgWWU 720w, https:\/\/miro.medium.com\/max\/750\/0*1rZL7Kt-6dXwgWWU 750w, https:\/\/miro.medium.com\/max\/786\/0*1rZL7Kt-6dXwgWWU 786w, https:\/\/miro.medium.com\/max\/828\/0*1rZL7Kt-6dXwgWWU 828w, https:\/\/miro.medium.com\/max\/1100\/0*1rZL7Kt-6dXwgWWU 1100w, https:\/\/miro.medium.com\/max\/1400\/0*1rZL7Kt-6dXwgWWU 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Source: <\/picture><a class=\"au kj\" href=\"https:\/\/christophm.github.io\/interpretable-ml-book\/interaction.html#examples-2\" target=\"_blank\" rel=\"noopener ugc nofollow\">christophm<\/a><\/div>\n<\/div>\n<\/figure>\n<p id=\"b0e5\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">A random forest has been used to predict whether a woman might get cervical cancer based on risk factors. Friedman\u2019s H-statistic has been applied to each feature, showing the relative interactive effects of all the features. Hormonal contraceptives have the highest effect in comparison to other features. Using this, we can further explore the 2-way interactions between features and the other features.<\/p>\n<p id=\"2b4e\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Advantages:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"67c3\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">Unlike PDP, Friedman\u2019s H-statistic allows you to analyze the interactions and the strength between&nbsp;<strong class=\"bm lh\">3 or more features<\/strong>.<\/li>\n<li id=\"8d41\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Interpretation with meaning.<\/strong>&nbsp;The features are statistically explored and the interactions are defined, allowing you to further dive into understanding more about the types of interactions.<\/li>\n<\/ul>\n<p id=\"25c9\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Disadvantages:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"440e\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">Friedman\u2019s H-statistic is&nbsp;<strong class=\"bm lh\">computationally expensive<\/strong>, taking a lot of time as it estimates the marginal distribution.<\/li>\n<li id=\"e945\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Variance<\/strong>. If all the data points are not used, the estimates for the marginal distribution face a certain variance, causing the results to be unstable.<\/li>\n<li id=\"6b9e\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Visualizing the interaction<\/strong>: Friedman\u2019s H-statistic shows us the strength of interaction between the features, however, it does not tell us through a 2D visualization of what the interactions look like, such as PDP.<\/li>\n<li id=\"34f2\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">Friedman\u2019s H-statistic&nbsp;<strong class=\"bm lh\">cannot be used for tasks such as Image Classifier<\/strong>&nbsp;as the inputs are pixels.<\/li>\n<\/ul>\n<h1 id=\"9fea\" class=\"li lj iy bm lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf ga\" data-selectable-paragraph=\"\">3. Global Surrogate<\/h1>\n<p id=\"17bf\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Global Surrogate<\/strong>&nbsp;is another type of interpretable modeling that is trained to approximate the predictions of a black-box model.<\/p>\n<p id=\"83d0\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Black-box models<\/strong>&nbsp;are models that are too complex that they are not interpretable by humans. Humans have little understanding of how the variables are being used or combined to make predictions. Using the black-box model, we can make conclusions about it by the use of a surrogate model.<\/p>\n<p id=\"69c8\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">A surrogate model, also known as a metamodel, or an emulator, response surface model, and emulator, is trained using a data-driven approach.<\/p>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*SirLLeDz9vpB9KJB\" alt=\"\" width=\"700\" height=\"438\"><\/figure><div class=\"gl gm ot\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*SirLLeDz9vpB9KJB 640w, https:\/\/miro.medium.com\/max\/720\/0*SirLLeDz9vpB9KJB 720w, https:\/\/miro.medium.com\/max\/750\/0*SirLLeDz9vpB9KJB 750w, https:\/\/miro.medium.com\/max\/786\/0*SirLLeDz9vpB9KJB 786w, https:\/\/miro.medium.com\/max\/828\/0*SirLLeDz9vpB9KJB 828w, https:\/\/miro.medium.com\/max\/1100\/0*SirLLeDz9vpB9KJB 1100w, https:\/\/miro.medium.com\/max\/1400\/0*SirLLeDz9vpB9KJB 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Source: <\/picture><a class=\"au kj\" href=\"https:\/\/towardsdatascience.com\/an-introduction-to-surrogate-modeling-part-i-fundamentals-84697ce4d241#91b2\" target=\"_blank\" rel=\"noopener\">TDS<\/a><\/div>\n<\/div>\n<\/figure>\n<p id=\"c0da\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">The steps of surrogate modeling:<\/p>\n<ol class=\"\">\n<li id=\"0a28\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Select a dataset.<\/li>\n<li id=\"7d2c\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">You can use the same dataset that was training the black-box model or a completely new dataset from the same distribution.<\/li>\n<li id=\"6edd\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Once you have selected your dataset, get the predictions of the black-box model.<\/li>\n<li id=\"556b\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Select your interpretable model type.<\/li>\n<li id=\"fa4d\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">This can be a linear model, decision tree, random forest, etc.<\/li>\n<li id=\"39ec\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Train the interpretable model on your selected dataset and its predictions.<\/li>\n<li id=\"b500\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">There you have it. A surrogate model.<\/li>\n<li id=\"605d\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nw nl nm nn ga\" data-selectable-paragraph=\"\">Your next step to help you better interpret is to measure the difference between the surrogate model predictions and those of the black-box model.<\/li>\n<\/ol>\n<p id=\"2578\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">The R-squared measure can be used to calculate the difference between the surrogate model and the black-box model, measuring the replica between the two.<\/p>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*jLO1Hfow6tsOpMwS\" alt=\"\" width=\"700\" height=\"165\"><\/figure><div class=\"gl gm ou\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*jLO1Hfow6tsOpMwS 640w, https:\/\/miro.medium.com\/max\/720\/0*jLO1Hfow6tsOpMwS 720w, https:\/\/miro.medium.com\/max\/750\/0*jLO1Hfow6tsOpMwS 750w, https:\/\/miro.medium.com\/max\/786\/0*jLO1Hfow6tsOpMwS 786w, https:\/\/miro.medium.com\/max\/828\/0*jLO1Hfow6tsOpMwS 828w, https:\/\/miro.medium.com\/max\/1100\/0*jLO1Hfow6tsOpMwS 1100w, https:\/\/miro.medium.com\/max\/1400\/0*jLO1Hfow6tsOpMwS 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p>&nbsp;<\/p>\n<ul class=\"\">\n<li id=\"49ea\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">R2 is the percentage of variance captured by the surrogate model.<\/li>\n<li id=\"89ce\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">SSE is the sum of squared error.<\/li>\n<li id=\"13ab\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">SST is the sum of squares&#8217; total.<\/li>\n<li id=\"7d44\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">y^\u2217(i) is the prediction for the i-th instance of the surrogate model.<\/li>\n<li id=\"4c96\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">y^(i) Is the prediction of the black-box model.<\/li>\n<li id=\"9fb1\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">y^\u00af Is the mean of the black box model predictions.<\/li>\n<\/ul>\n<p id=\"0b5c\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">If the R2 value is close to 1, it will indicate a low SEE value, which in turn we can interpret that the interpretable model approximates the behavior of the black-box model well.<\/p>\n<p id=\"3d2c\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">If the R2 value is close to 0, it will indicate a high SEE value, in turn allowing us to infer that the interpretable model fails to explain the black-box model.<\/p>\n<h2 id=\"ba12\" class=\"mq lj iy bm lk mr ms mt lo mu mv mw ls ku mx my lw ky mz na ma lc nb nc me nd ga\" data-selectable-paragraph=\"\">An example:<\/h2>\n<p id=\"b1f0\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">Maintaining the same example throughout, the&nbsp;<a class=\"au kj\" href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Cervical+cancer+%28Risk+Factors%29\" target=\"_blank\" rel=\"noopener ugc nofollow\">cervical cancer dataset<\/a>&nbsp;uses a random forest. As mentioned in the steps above, you select your interpretable model type and train it on the original dataset. In this case, we\u2019re using a decision tree, but using the prediction from the random forest as the outcomes. The counts in the nodes show the frequency of the classifications in the nodes using the black-box model.<\/p>\n<figure class=\"mm mn mo mp gx jz gl gm paragraph-image\">\n<div class=\"ka kb do kc ce kd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce ke kf c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*9jqCCTEAgakKkrfF\" alt=\"\" width=\"700\" height=\"501\"><\/figure><div class=\"gl gm ne\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*9jqCCTEAgakKkrfF 640w, https:\/\/miro.medium.com\/max\/720\/0*9jqCCTEAgakKkrfF 720w, https:\/\/miro.medium.com\/max\/750\/0*9jqCCTEAgakKkrfF 750w, https:\/\/miro.medium.com\/max\/786\/0*9jqCCTEAgakKkrfF 786w, https:\/\/miro.medium.com\/max\/828\/0*9jqCCTEAgakKkrfF 828w, https:\/\/miro.medium.com\/max\/1100\/0*9jqCCTEAgakKkrfF 1100w, https:\/\/miro.medium.com\/max\/1400\/0*9jqCCTEAgakKkrfF 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Source: <\/picture><a class=\"au kj\" href=\"https:\/\/christophm.github.io\/interpretable-ml-book\/interaction.html#examples-2\" target=\"_blank\" rel=\"noopener ugc nofollow\">christophm<\/a><\/div>\n<\/div>\n<\/figure>\n<p id=\"bc08\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Advantages:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"f718\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\">The&nbsp;<strong class=\"bm lh\">R-squared measure is a popular metric<\/strong>. It helps us to measure how good the surrogate model is in approximating black-box model predictions.<\/li>\n<li id=\"9a6f\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">Surrogate modeling is&nbsp;<strong class=\"bm lh\">easy and simple to implement<\/strong>. This allows for smoother interpretations and better explanations for people who have little to no knowledge in the world of Data Science and Machine Learning.<\/li>\n<li id=\"e309\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Flexibility<\/strong>: Being able to use any interpretable model type gives the adoption of surrogate modeling flexibility. This allows you to exchange the interpretable model, as well as the underlying black-box model.<\/li>\n<li id=\"a867\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Less computationally expensive.<\/strong>&nbsp;Training and employing surrogate modeling is much cheaper than using other methods.<\/li>\n<\/ul>\n<p id=\"4e1c\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Disadvantages:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"bc18\" class=\"nf ng iy bm b km kn kq kr ku nh ky ni lc nj lg nk nl nm nn ga\" data-selectable-paragraph=\"\"><strong class=\"bm lh\">Choosing your interpretable model.<\/strong>&nbsp;Although this is one of the advantages, due to its flexibility. You also need to take into consideration that depending on which interpretable model you chose, it comes with its advantages and disadvantages.<\/li>\n<li id=\"a242\" class=\"nf ng iy bm b km no kq np ku nq ky nr lc ns lg nk nl nm nn ga\" data-selectable-paragraph=\"\">It\u2019s&nbsp;<strong class=\"bm lh\">about the model, not the data<\/strong>. When using surrogate modeling, you need to remember that you are drawing up conclusions and interpretations about the model, not about the data. Surrogate modeling does not allow you to see the real outcome.<\/li>\n<\/ul>\n<h1 id=\"bfee\" class=\"li lj iy bm lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf ga\" data-selectable-paragraph=\"\">Conclusion<\/h1>\n<p id=\"02dd\" class=\"pw-post-body-paragraph kk kl iy bm b km mg ko kp kq mh ks kt ku mi kw kx ky mj la lb lc mk le lf lg ir ga\" data-selectable-paragraph=\"\">In this part of the series, we have covered what Global Methods are and how they are related to Model Agnostic methods. I have gone through two different types of Model Agnostic methods, exploring the mathematics behind them, an example for your better understanding, and advantages and disadvantages to help you choose which method you should use.<\/p>\n<p id=\"f24e\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">In the next part, I will further explain more about Local Model Agnostic Methods.<\/p>\n<p id=\"25e6\" class=\"pw-post-body-paragraph kk kl iy bm b km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg ir ga\" data-selectable-paragraph=\"\">Stay tuned!<\/p>\n<\/div>\n\n\n\n<div class=\"o dx nx ny id nz\" role=\"separator\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by NASA&nbsp;on&nbsp;Unsplash As mentioned in&nbsp;Part 1 of Model Interpretability, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular. Data Scientists and Machine Learning Engineers can use any machine learning model they wish as the interpretation method can be applied to it. This allows for the evaluation of [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[139],"class_list":["post-4591","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Model Interpretability Part 2: Global Model Agnostic Methods - Comet<\/title>\n<meta name=\"description\" content=\"As mentioned in\u00a0Part 1 of Model Interpretability, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Model Interpretability Part 2: Global Model Agnostic Methods\" \/>\n<meta property=\"og:description\" content=\"As mentioned in\u00a0Part 1 of Model Interpretability, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-11T01:48:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg\" \/>\n<meta name=\"author\" content=\"Nisha Arya Ahmed\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nisha Arya Ahmed\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Model Interpretability Part 2: Global Model Agnostic Methods - Comet","description":"As mentioned in\u00a0Part 1 of Model Interpretability, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/","og_locale":"en_US","og_type":"article","og_title":"Model Interpretability Part 2: Global Model Agnostic Methods","og_description":"As mentioned in\u00a0Part 1 of Model Interpretability, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular.","og_url":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-11T01:48:04+00:00","article_modified_time":"2025-04-24T17:16:38+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg","type":"","width":"","height":""}],"author":"Nisha Arya Ahmed","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Nisha Arya Ahmed","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"Model Interpretability Part 2: Global Model Agnostic Methods","datePublished":"2022-11-11T01:48:04+00:00","dateModified":"2025-04-24T17:16:38+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/"},"wordCount":2025,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/","url":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/","name":"Model Interpretability Part 2: Global Model Agnostic Methods - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg","datePublished":"2022-11-11T01:48:04+00:00","dateModified":"2025-04-24T17:16:38+00:00","description":"As mentioned in\u00a0Part 1 of Model Interpretability, the flexibility of model-agnostics is the greatest advantage, being the reason why they are so popular.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg","contentUrl":"https:\/\/miro.medium.com\/max\/700\/1*gTKpW04SfIdZp-I_rmGzag.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/model-interpretability-part-2-global-model-agnostic-methods\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Model Interpretability Part 2: Global Model Agnostic Methods"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4591","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4591"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4591\/revisions"}],"predecessor-version":[{"id":15654,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4591\/revisions\/15654"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4591"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}