{"id":6588,"date":"2023-07-03T14:01:26","date_gmt":"2023-07-03T22:01:26","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=6588"},"modified":"2025-04-24T17:15:15","modified_gmt":"2025-04-24T17:15:15","slug":"how-to-evaluate-clustering-models-in-python","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/","title":{"rendered":"How to Evaluate Clustering Models in Python"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<figure class=\"mh mi mj mk ml mm me mf paragraph-image\">\n<div class=\"mn mo eb mp bg mq\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mr ms c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg\" alt=\"\" width=\"700\" height=\"468\"><\/figure><div class=\"me mf mg\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mt mu mv me mf mw mx be b bf z dv\" data-selectable-paragraph=\"\">Photo by <a class=\"af my\" href=\"https:\/\/unsplash.com\/@arnaudmariat?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Arnaud Mariat<\/a> on <a class=\"af my\" href=\"https:\/\/unsplash.com\/s\/photos\/clusters?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<p id=\"8d26\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">Machine learning is a subset of artificial intelligence that employs statistical algorithms and other methods to visualize, analyze and forecast data. Generally, machine learning is broken down into two subsequent categories based on certain properties of the data used: <strong class=\"be nu\">supervised<\/strong> and <strong class=\"be nu\">unsupervised<\/strong>.<\/p>\n<p id=\"6f93\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">Supervised learning algorithms refer to those that require training datasets with labels attached to the features. Unsupervised learning algorithms, on the other hand, make use of mathematical formulas that find patterns in unlabelled data, often by identifying clusters of data with similar (but independent) features. In this article, we will discuss different clustering algorithms and how to evaluate their results. Let\u2019s get started!<\/p>\n<h1 id=\"081d\" class=\"nv nw fo be nx ny nz go oa ob oc gr od oe of og oh oi oj ok ol om on oo op oq bj\" data-selectable-paragraph=\"\">What is Clustering?<\/h1>\n<p id=\"8b7d\" class=\"pw-post-body-paragraph mz na fo be b gm or nc nd gp os nf ng nh ot nj nk nl ou nn no np ov nr ns nt fh bj\" data-selectable-paragraph=\"\">Clustering (sometimes referred to as <strong class=\"be nu\">cluster analysis<\/strong>) is an unsupervised machine learning technique used to identify and group similar data points within a larger, unlabelled dataset. It refers to the process of finding a structure or pattern inside an otherwise unstructured dataset.<\/p>\n<p id=\"0dc0\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">The four main types of clustering include:<\/p>\n<ul class=\"\">\n<li id=\"e703\" class=\"mz na fo be b gm nb nc nd gp ne nf ng ow ni nj nk ox nm nn no oy nq nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nu\">Centroid-based Clustering<\/strong> (e.g., K-Means Clustering)<\/li>\n<li id=\"3557\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nu\">Density-based Clustering<\/strong> (e.g., DBSCAN)<\/li>\n<li id=\"e3b3\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nu\">Distribution-based Clustering<\/strong> (e.g., Gaussian Mixture Models, or GMMs)<\/li>\n<li id=\"d88a\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nu\">Hierarchical Clustering <\/strong>(e.g., Agglomerative, Divisive)<\/li>\n<\/ul>\n<p id=\"c8bb\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">In this article we will explore K-means clustering, hierarchical clustering, and DBSCAN, as these are some of the most common (and effective) methods used currently, but it\u2019s good to be aware that there are other methods out there as well.<\/p>\n<h2 id=\"196e\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\">1. K-Means Clustering<\/h2>\n<p id=\"5e95\" class=\"pw-post-body-paragraph mz na fo be b gm or nc nd gp os nf ng nh ot nj nk nl ou nn no np ov nr ns nt fh bj\" data-selectable-paragraph=\"\">K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data into <code class=\"cw py pz qa qb b\">k<\/code> number clusters, where <code class=\"cw py pz qa qb b\">k<\/code> is a user-defined integer. K-means is an iterative algorithm that makes use of cluster centroids to divide the data in a way that groups similar data into groups.<\/p>\n<p id=\"e89a\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">K-means clustering starts by taking <code class=\"cw py pz qa qb b\">k<\/code> random points, and marks these points as <a class=\"af my\" href=\"https:\/\/en.wikipedia.org\/wiki\/Centroid\" target=\"_blank\" rel=\"noopener ugc nofollow\">centroids<\/a> of <code class=\"cw py pz qa qb b\">k<\/code> clusters. It then calculates the <a class=\"af my\" href=\"https:\/\/en.wikipedia.org\/wiki\/Euclidean_distance\" target=\"_blank\" rel=\"noopener ugc nofollow\">Euclidean distance<\/a> for each remaining data point from each of those centroids and assigns each data point to its closest cluster (based on the Euclidean distance from the centroid). Once a new point is added to the cluster, it recalculates the centroid by taking the mean of all the vectors inside the group, and then recursively calculates distance again. Then, the new centroid is recalculated, and this is repeated until all data points are assigned to a cluster.<\/p>\n<p id=\"cea5\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">In order for K-means clustering to be effective, however, it is imperative to first determine the optimal value for <code class=\"cw py pz qa qb b\">k<\/code>. There are various techniques for doing so, but one of the most effective is through simple data visualization. We will also cover a couple of other methods, however, in examples later on.<\/p>\n<figure class=\"mh mi mj mk ml mm me mf paragraph-image\">\n<div class=\"mn mo eb mp bg mq\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mr ms c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*_xGJdgLJapu9OPTcIjvgbw.png\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"me mf qc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*_xGJdgLJapu9OPTcIjvgbw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*_xGJdgLJapu9OPTcIjvgbw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*_xGJdgLJapu9OPTcIjvgbw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*_xGJdgLJapu9OPTcIjvgbw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*_xGJdgLJapu9OPTcIjvgbw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*_xGJdgLJapu9OPTcIjvgbw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*_xGJdgLJapu9OPTcIjvgbw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*_xGJdgLJapu9OPTcIjvgbw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mt mu mv me mf mw mx be b bf z dv\" data-selectable-paragraph=\"\">Visually determining the optimal k value<\/figcaption>\n<\/figure>\n<h2 id=\"2742\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\">2. Hierarchical Clustering<\/h2>\n<p id=\"16e0\" class=\"pw-post-body-paragraph mz na fo be b gm or nc nd gp os nf ng nh ot nj nk nl ou nn no np ov nr ns nt fh bj\" data-selectable-paragraph=\"\">Hierarchical clustering is another type of unsupervised clustering algorithm, in which we create a <strong class=\"be nu\">hierarchy of clusters <\/strong>in the form of a tree, also referred to as a <a class=\"af my\" href=\"https:\/\/en.wikipedia.org\/wiki\/Dendrogram\" target=\"_blank\" rel=\"noopener ugc nofollow\">dendrogram<\/a><strong class=\"be nu\">. <\/strong>Hierarchical clustering also automatically finds patterns by dividing data into <code class=\"cw py pz qa qb b\">n<\/code> clusters. However, in this case, there is no need to define the number of clusters (the algorithm will determine this for you).<\/p>\n<p id=\"9f98\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">There are two main approaches to hierarchical clustering: <a class=\"af my\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.AgglomerativeClustering.html\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be nu\">agglomerative<\/strong><\/a> and <a class=\"af my\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/clustering.html\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be nu\">divisive<\/strong><\/a>. In agglomerative clustering, we consider each data point as a single cluster, and then combine these clusters until we are left with one group (the full dataset). Divisive hierarchical clustering, on the other hand, begins with the whole dataset (considered as one single cluster), which is then partitioned into less similar clusters until each individual data point becomes its own unique cluster.<\/p>\n<figure class=\"mh mi mj mk ml mm me mf paragraph-image\">\n<div class=\"mn mo eb mp bg mq\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mr ms c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*6lTcB15YXA05dIizH9QyjQ.png\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"me mf qc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*6lTcB15YXA05dIizH9QyjQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*6lTcB15YXA05dIizH9QyjQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*6lTcB15YXA05dIizH9QyjQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*6lTcB15YXA05dIizH9QyjQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*6lTcB15YXA05dIizH9QyjQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*6lTcB15YXA05dIizH9QyjQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*6lTcB15YXA05dIizH9QyjQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*6lTcB15YXA05dIizH9QyjQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mt mu mv me mf mw mx be b bf z dv\" data-selectable-paragraph=\"\">Agglomerative vs. divisive hierarchical clustering<\/figcaption>\n<\/figure>\n<h2 id=\"30a5\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\">3. DBSCAN Clustering<\/h2>\n<p id=\"24aa\" class=\"pw-post-body-paragraph mz na fo be b gm or nc nd gp os nf ng nh ot nj nk nl ou nn no np ov nr ns nt fh bj\" data-selectable-paragraph=\"\"><a class=\"af my\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.DBSCAN.html?highlight=dbscan\" target=\"_blank\" rel=\"noopener ugc nofollow\">DBSCAN<\/a> stands for <strong class=\"be nu\">density-based spatial clustering of application with noise<\/strong>. DBSCAN clustering works upon a simple assumption that a data point belongs to a cluster if it is closer to <em class=\"qd\">many<\/em> data points of that cluster, rather than any single point. It requires two parameters for dividing data into groups: <code class=\"cw py pz qa qb b\">epsilon<\/code> and <code class=\"cw py pz qa qb b\">min_points<\/code>. <code class=\"cw py pz qa qb b\">epsilon<\/code> specifies how close one point should be to another in order to consider it part of the cluster, while <code class=\"cw py pz qa qb b\">min_points<\/code> determines the minimum number of data points required to form a cluster.<\/p>\n<p id=\"1df2\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">One of the biggest advantages of DBSCAN clustering is that it is very robust to outliers and doesn\u2019t require information about the cluster size for training.<\/p>\n<figure class=\"mh mi mj mk ml mm me mf paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mr ms c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:400\/0*jONOcwF8O9hzf8nc.png\" alt=\"\" width=\"400\" height=\"288\"><\/figure><div class=\"me mf qe\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*jONOcwF8O9hzf8nc.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*jONOcwF8O9hzf8nc.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*jONOcwF8O9hzf8nc.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*jONOcwF8O9hzf8nc.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*jONOcwF8O9hzf8nc.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*jONOcwF8O9hzf8nc.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:800\/format:webp\/0*jONOcwF8O9hzf8nc.png 800w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 400px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*jONOcwF8O9hzf8nc.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*jONOcwF8O9hzf8nc.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*jONOcwF8O9hzf8nc.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*jONOcwF8O9hzf8nc.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*jONOcwF8O9hzf8nc.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*jONOcwF8O9hzf8nc.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:800\/0*jONOcwF8O9hzf8nc.png 800w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 400px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mt mu mv me mf mw mx be b bf z dv\" data-selectable-paragraph=\"\">SOURCE \u2014 <a class=\"af my\" href=\"https:\/\/en.wikipedia.org\/wiki\/DBSCAN\" target=\"_blank\" rel=\"noopener ugc nofollow\">Wikipedia<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"ab ca qf qg qh qi\" role=\"separator\"><\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<blockquote class=\"qn\"><p id=\"9c27\" class=\"qo qp fo be qq qr qs qt qu qv qw nt dv\" data-selectable-paragraph=\"\">Struggling to track and reproduce complex experiment parameters? Artifacts are just one of the many tools in the Comet toolbox to help ease model management. Read our PetCam scenario to <a class=\"af my\" href=\"https:\/\/www.comet.com\/site\/debugging-your-machine-learning-models-with-comet-artifacts\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">learn more<\/a>.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"e249\" class=\"nv nw fo be nx ny qx go oa ob qy gr od oe qz og oh oi ra ok ol om rb oo op oq bj\" data-selectable-paragraph=\"\">Building a Clustering Model<\/h1>\n<p id=\"c91a\" class=\"pw-post-body-paragraph mz na fo be b gm or nc nd gp os nf ng nh ot nj nk nl ou nn no np ov nr ns nt fh bj\" data-selectable-paragraph=\"\">We are going to make use of the K-means clustering algorithm to cluster different iris flower species into clusters, using the famous iris dataset. To determine the correct number of clusters we will make use of the <a class=\"af my\" href=\"https:\/\/en.wikipedia.org\/wiki\/Elbow_method_(clustering)\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be nu\">elbow method<\/strong><\/a>. The dataset we are using is located in the sklearn dataset class.<\/p>\n<pre class=\"mh mi mj mk ml rc qb rd re ax rf bj\"><span id=\"2b7c\" class=\"ph nw fo qb b ia rg rh l iq ri\" data-selectable-paragraph=\"\">from sklearn import datasets\ndf = datasets.load_iris()\nX = df['data']\ny = df['target']               # not needed for clustering<\/span><\/pre>\n<p id=\"86f6\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\"><code class=\"cw py pz qa qb b\">X<\/code> will contain information about sepal width, sepal length, and petal width whereas <code class=\"cw py pz qa qb b\">y<\/code> will contain information regarding the type of flower species. We will only use <code class=\"cw py pz qa qb b\">X<\/code>, and try to divide the dataset into different flower species clusters using K-means instead. Below, we use the elbow method to find the value of <code class=\"cw py pz qa qb b\">k<\/code>.<\/p>\n<pre class=\"mh mi mj mk ml rc qb rd re ax rf bj\"><span id=\"3bcd\" class=\"ph nw fo qb b ia rg rh l iq ri\" data-selectable-paragraph=\"\">from sklearn.cluster import KMeans<\/span><span id=\"45ed\" class=\"ph nw fo qb b ia rj rh l iq ri\" data-selectable-paragraph=\"\">wcss = []\nfor i in range(1, 11):\n    kmeans = KMeans(n_clusters = i, init = 'k-means++',\n                random_state = 42)\n    kmeans.fit(X)\n    wcss.append(kmeans.inertia_)\nplt.plot(range(1, 11), wcss)\nplt.title('The Elbow Method')\nplt.xlabel('Number of clusters')\nplt.ylabel('WCSS')\nplt.show()<\/span><\/pre>\n<figure class=\"mh mi mj mk ml mm me mf paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mr ms c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:491\/1*XZKP9FfwUY9EpsLKKtaSBA.png\" alt=\"\" width=\"491\" height=\"345\"><\/figure><div class=\"me mf rk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:982\/format:webp\/1*XZKP9FfwUY9EpsLKKtaSBA.png 982w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 491px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*XZKP9FfwUY9EpsLKKtaSBA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*XZKP9FfwUY9EpsLKKtaSBA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*XZKP9FfwUY9EpsLKKtaSBA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*XZKP9FfwUY9EpsLKKtaSBA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*XZKP9FfwUY9EpsLKKtaSBA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*XZKP9FfwUY9EpsLKKtaSBA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:982\/1*XZKP9FfwUY9EpsLKKtaSBA.png 982w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 491px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"426f\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">The elbow method has given us an optimal value of <code class=\"cw py pz qa qb b\">k<\/code> that is 3. Let\u2019s use this value to build a model.<\/p>\n<pre class=\"mh mi mj mk ml rc qb rd re ax rf bj\"><span id=\"0303\" class=\"ph nw fo qb b ia rg rh l iq ri\" data-selectable-paragraph=\"\">from sklearn.cluster import KMeans\nKmean = KMeans(n_clusters=3)\nKmean.fit(X)<\/span><span id=\"fa59\" class=\"ph nw fo qb b ia rj rh l iq ri\" data-selectable-paragraph=\"\">## Predictions\ny_pred = Kmean.predict(X)<\/span><\/pre>\n<p id=\"9178\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">Now that we have our labels and predictions let\u2019s evaluate this model to find out well it performed!<\/p>\n<h1 id=\"2ca1\" class=\"nv nw fo be nx ny nz go oa ob oc gr od oe of og oh oi oj ok ol om on oo op oq bj\" data-selectable-paragraph=\"\">Evaluation Metrics For Clustering-Based Models<\/h1>\n<h2 id=\"194a\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\"><strong class=\"al\">1. Silhouette Score<\/strong><\/h2>\n<ul class=\"\">\n<li id=\"c8fd\" class=\"mz na fo be b gm or nc nd gp os nf ng ow ot nj nk ox ou nn no oy ov nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">The silhouette score is a metric used to calculate the goodness of fit of a clustering algorithm, but can also be used as a method for determining an optimal value of <code class=\"cw py pz qa qb b\">k<\/code> (<a class=\"af my\" href=\"https:\/\/scikit-learn.org\/stable\/auto_examples\/cluster\/plot_kmeans_silhouette_analysis.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">see here for more<\/a>).<\/li>\n<li id=\"bb00\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">It is calculated by taking the mean distance from intra-cluster and nearest cluster samples.<\/li>\n<li id=\"2dc0\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">Its value ranges from -1 to 1.<\/li>\n<li id=\"f134\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">A value of 0 indicates clusters are overlapping and either the data or the value of <code class=\"cw py pz qa qb b\">k<\/code> is incorrect.<\/li>\n<li id=\"7f87\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">1 is the ideal value and indicates that clusters are very dense and nicely separated.<\/li>\n<li id=\"fba8\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">A negative value indicates elements have likely been assigned to the wrong clusters.<\/li>\n<li id=\"0872\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nu\">The closer the value of the silhouette score to 1 the better-separated the clusters.<\/strong><\/li>\n<\/ul>\n<pre class=\"mh mi mj mk ml rc qb rd re ax rf bj\"><span id=\"ba0d\" class=\"ph nw fo qb b ia rg rh l iq ri\" data-selectable-paragraph=\"\">from sklearn.metrics import silhouette_score\nsilhouette_score(X,y_pred)\n--------------------------------\n0.55<\/span><span id=\"e822\" class=\"ph nw fo qb b ia rj rh l iq ri\" data-selectable-paragraph=\"\">\"\"\"\nsilhouette score is 0.55 which is acceptable and shows clusters are not overlapping.\n\"\"\"<\/span><\/pre>\n<h2 id=\"451b\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\">2. <strong class=\"al\">Calinski Harabaz Index<\/strong><\/h2>\n<ul class=\"\">\n<li id=\"1c1c\" class=\"mz na fo be b gm or nc nd gp os nf ng ow ot nj nk ox ou nn no oy ov nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">It is also known as the Variance Ratio Criterion.<\/li>\n<li id=\"8e3c\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><a class=\"af my\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.calinski_harabasz_score.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">Calinski Harabaz Index<\/a> is defined as the ratio of the sum of between-cluster dispersion and of within-cluster dispersion.<\/li>\n<li id=\"176d\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\"><strong class=\"be nu\">The higher the index the more separable the clusters.<\/strong><\/li>\n<\/ul>\n<pre class=\"mh mi mj mk ml rc qb rd re ax rf bj\"><span id=\"86e1\" class=\"ph nw fo qb b ia rg rh l iq ri\" data-selectable-paragraph=\"\">from sklearn.metrics import calinski_harabasz_score\ncalinski_harabasz_score(X,y_pred)\n---------------------------------------------\n561.62775<\/span><\/pre>\n<h2 id=\"2fb5\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\">3. <strong class=\"al\">Davies Bouldin index<\/strong><\/h2>\n<ul class=\"\">\n<li id=\"0318\" class=\"mz na fo be b gm or nc nd gp os nf ng ow ot nj nk ox ou nn no oy ov nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">The Davies\u2013Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is another metric for evaluating clustering algorithms.<\/li>\n<li id=\"3db6\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">The <a class=\"af my\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.davies_bouldin_score.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">Davies Bouldin index<\/a> is defined as the average similarity measure of each cluster with its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances.<\/li>\n<li id=\"93d6\" class=\"mz na fo be b gm pc nc nd gp pd nf ng ow pe nj nk ox pf nn no oy pg nr ns nt oz pa pb bj\" data-selectable-paragraph=\"\">The minimum value of the DB Index is 0, whereas a <strong class=\"be nu\">smaller value (closer to 0) represents a better model that produces better clusters.<\/strong><\/li>\n<\/ul>\n<pre class=\"mh mi mj mk ml rc qb rd re ax rf bj\"><span id=\"2d8e\" class=\"ph nw fo qb b ia rg rh l iq ri\" data-selectable-paragraph=\"\">from sklearn.metrics import davies_bouldin_score\ndavies_bouldin_score(X,y_pred)\n-------------------------------------------\n0.6619<\/span><\/pre>\n<p id=\"c05c\" class=\"pw-post-body-paragraph mz na fo be b gm nb nc nd gp ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fh bj\" data-selectable-paragraph=\"\">Based on the above evaluation scores we can conclude that our model is a decent performer.<\/p>\n<h2 id=\"fe7b\" class=\"ph nw fo be nx pi pj pk oa pl pm pn od nh po pp pq nl pr ps pt np pu pv pw px bj\" data-selectable-paragraph=\"\">Conclusion<\/h2>\n<p id=\"b99a\" class=\"pw-post-body-paragraph mz na fo be b gm or nc nd gp os nf ng nh ot nj nk nl ou nn no np ov nr ns nt fh bj\" data-selectable-paragraph=\"\">It is always a better idea to evaluate your machine learning models before making decisions based on them. Metric evaluation is an easy-to-interpret solution for checking the performance of the model.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Arnaud Mariat on Unsplash Machine learning is a subset of artificial intelligence that employs statistical algorithms and other methods to visualize, analyze and forecast data. Generally, machine learning is broken down into two subsequent categories based on certain properties of the data used: supervised and unsupervised. Supervised learning algorithms refer to those that [&hellip;]<\/p>\n","protected":false},"author":47,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6,7],"tags":[],"coauthors":[120],"class_list":["post-6588","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Evaluate Clustering Models in Python - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Evaluate Clustering Models in Python\" \/>\n<meta property=\"og:description\" content=\"Photo by Arnaud Mariat on Unsplash Machine learning is a subset of artificial intelligence that employs statistical algorithms and other methods to visualize, analyze and forecast data. Generally, machine learning is broken down into two subsequent categories based on certain properties of the data used: supervised and unsupervised. Supervised learning algorithms refer to those that [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-07-03T22:01:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:15:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg\" \/>\n<meta name=\"author\" content=\"Abhay Parashar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Abhay Parashar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to Evaluate Clustering Models in Python - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/","og_locale":"en_US","og_type":"article","og_title":"How to Evaluate Clustering Models in Python","og_description":"Photo by Arnaud Mariat on Unsplash Machine learning is a subset of artificial intelligence that employs statistical algorithms and other methods to visualize, analyze and forecast data. Generally, machine learning is broken down into two subsequent categories based on certain properties of the data used: supervised and unsupervised. Supervised learning algorithms refer to those that [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-07-03T22:01:26+00:00","article_modified_time":"2025-04-24T17:15:15+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg","type":"","width":"","height":""}],"author":"Abhay Parashar","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Abhay Parashar","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/"},"author":{"name":"Abhay Parashar","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/efd71dff0d86bae98e6ccfafd79e6280"},"headline":"How to Evaluate Clustering Models in Python","datePublished":"2023-07-03T22:01:26+00:00","dateModified":"2025-04-24T17:15:15+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/"},"wordCount":1152,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/","url":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/","name":"How to Evaluate Clustering Models in Python - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg","datePublished":"2023-07-03T22:01:26+00:00","dateModified":"2025-04-24T17:15:15+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*1Z2sqA_uyI7E3vvUTz9uUw.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-evaluate-clustering-models-in-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"How to Evaluate Clustering Models in Python"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/efd71dff0d86bae98e6ccfafd79e6280","name":"Abhay Parashar","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/48a73d1fb964b15ec72122f8815ad8af","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1654615642757-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1654615642757-96x96.jpg","caption":"Abhay Parashar"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/parasharabhay13gmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/6588","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/47"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=6588"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/6588\/revisions"}],"predecessor-version":[{"id":15606,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/6588\/revisions\/15606"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=6588"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=6588"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=6588"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=6588"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}