{"id":5826,"date":"2023-05-13T08:14:34","date_gmt":"2023-05-13T16:14:34","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=5826"},"modified":"2025-04-29T14:05:22","modified_gmt":"2025-04-29T14:05:22","slug":"debugging-image-classifiers-with-confusion-matrices","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/","title":{"rendered":"Debugging Image Classifiers With Confusion Matrices"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter size-full wp-image-5831\"><img loading=\"lazy\" decoding=\"async\" width=\"1330\" height=\"865\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-04-30-at-8.06.27-PM.png\" alt=\"Interactive confusion matrices per epoch of our image classification model, as seen in the Comet UI\" class=\"wp-image-5831\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-04-30-at-8.06.27-PM.png 1330w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-04-30-at-8.06.27-PM-300x195.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-04-30-at-8.06.27-PM-1024x666.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-04-30-at-8.06.27-PM-768x499.png 768w\" sizes=\"auto, (max-width: 1330px) 100vw, 1330px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.comet.com\/anmorgan24\/interactive-confusion-matrix\/f6e36031bbd94214b6dbb442ddb59d4f?assetId=753838fcf67c407cbf112b8d3415b72c,94e00f8928fa4596bfb2a61c8bb45299,2d36930fb1df48f99216c81bc8c934b5&amp;cellValue=Counts&amp;colorDistribution=Equal&amp;experiment-tab=confusionMatrix&amp;viewId=OrFOvhDzPhD7hW4WycSikjqZa\">Interactive confusion matrices per epoch<\/a> of our image classification model, as seen in the Comet UI; image by author<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">We often rely on scalar metrics and static plots to describe and evaluate machine learning models, but these methods rarely capture the full story. Especially when dealing with computer vision tasks like classification, detection, segmentation, and generation, visualizing your outputs is essential to understanding how your model is behaving and why.&nbsp;<\/span><\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/colab.research.google.com\/drive\/1Fyrk6Br3EtahbFttIQh1cMgT1ztHKrzs\" target=\"_blank\" rel=\"noreferrer noopener\"><span class=\"s1\">Follow along with the Colab!<\/span><\/a><\/div>\n<\/div>\n\n\n\n<p><span style=\"font-weight: 400;\">We may notice that a model has a particularly low precision or recall value, but an individual statistic doesn\u2019t give us any insight into which categories of data our model is struggling with the most, or how we might augment our training data for better results. As another example, bounding box coordinates mean little to us when presented as a list of integers or floats. But when these same numbers are overlaid as a patch on an image, we can immediately recognize whether a model has accurately detected an object or not. Especially when working with image data, it\u2019s often much quicker and easier to spot patterns in information that is presented to us visually.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-5833\"><img loading=\"lazy\" decoding=\"async\" width=\"1530\" height=\"315\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-4.55.16-PM.png\" alt=\"Screenshot of scalar metrics panel in Comet dashboard\" class=\"wp-image-5833\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-4.55.16-PM.png 1530w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-4.55.16-PM-300x62.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-4.55.16-PM-1024x211.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-4.55.16-PM-768x158.png 768w\" sizes=\"auto, (max-width: 1530px) 100vw, 1530px\" \/><figcaption class=\"wp-element-caption\">Scalar metrics help us benchmark different experiment runs against each other, but they provide limited information to help us debug image classification models. We see here that the <a href=\"https:\/\/www.comet.com\/anmorgan24\/interactive-confusion-matrix\/view\/1oobhPiRdETUwTnWr9AGhOK0b\/panels\">epoch mAR is lower than the epoch mAP<\/a>, but did the model struggle more with confusing penguins as turtles or vice-versa?<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Confusion Matrix<\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">A confusion matrix is a popular way to inspect the performance of a classification model. It combines multiple metrics into a single table to summarize a model\u2019s behavior across different classes. Typically, actual categories are plotted against a model\u2019s predicted categories, as shown below:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-5834\"><img loading=\"lazy\" decoding=\"async\" width=\"663\" height=\"431\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-3.33.57-PM.png\" alt=\"Confusion matrix of a Fast-RCNN model\u2019s predictions\" class=\"wp-image-5834\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-3.33.57-PM.png 663w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-3.33.57-PM-300x195.png 300w\" sizes=\"auto, (max-width: 663px) 100vw, 663px\" \/><figcaption class=\"wp-element-caption\">Confusion matrix of a <a href=\"https:\/\/www.comet.com\/anmorgan24\/interactive-confusion-matrix\/f6e36031bbd94214b6dbb442ddb59d4f?assetId=753838fcf67c407cbf112b8d3415b72c&amp;cellValue=Counts&amp;colorDistribution=Equal&amp;experiment-tab=confusionMatrix&amp;viewId=OrFOvhDzPhD7hW4WycSikjqZa\">Fast-RCNN model\u2019s predictions<\/a>; GIF by author<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">And while this plot is helpful in illustrating a given model\u2019s \u201cconfusion\u201d between categories, it only tells part of the story. Are there any patterns in the images the model is struggling with? Maybe the model tends to get confused when it sees a particular breed of one of the animals. Or maybe different backgrounds are influencing its decisions. We really can\u2019t be sure without visualizing exactly what the model predicted, and on which images.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">In this article, we\u2019ll explore how to use Comet\u2019s interactive confusion matrix for a multi-class image classification task. Follow along with the full code in <\/span><a href=\"https:\/\/colab.research.google.com\/drive\/1Fyrk6Br3EtahbFttIQh1cMgT1ztHKrzs#scrollTo=ccj4F-z_W6ph\"><span style=\"font-weight: 400;\">this Colab tutorial<\/span><\/a><span style=\"font-weight: 400;\">, and make sure to check out <\/span><a href=\"https:\/\/www.comet.com\/anmorgan24\/interactive-confusion-matrix\/view\/new\/panels\"><span style=\"font-weight: 400;\">the public project here<\/span><\/a><span style=\"font-weight: 400;\">!<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Note that to run these experiments, you&#8217;ll need to have your Comet API key configured. If you don\u2019t already have an account, <\/span><a href=\"\/signup?utm_content=confusion_matrix_with_images_blog\"><span style=\"font-weight: 400;\">create one here for free<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Our Data<\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">For this tutorial, we\u2019ll be using a dataset of 572 images of penguins and turtles.The training set contains 500 images, and the validation set contains 72 images, both of which are split evenly between classes. Each image contains exactly one instance of an object, and since being a penguin, being a turtle, and being the background are all mutually exclusive, this is a multi-class, but <\/span><i><span style=\"font-weight: 400;\">not <\/span><\/i><span style=\"font-weight: 400;\">a <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Multi-label_classification\"><span style=\"font-weight: 400;\">multi-label classification<\/span><\/a><span style=\"font-weight: 400;\"> task. Download the full dataset on <\/span><a href=\"https:\/\/www.kaggle.com\/datasets\/abbymorgan\/penguins-vs-turtles\"><span style=\"font-weight: 400;\">Kaggle here<\/span><\/a><span style=\"font-weight: 400;\"> and follow along with <\/span><a href=\"https:\/\/colab.research.google.com\/drive\/1Fyrk6Br3EtahbFttIQh1cMgT1ztHKrzs#scrollTo=FpszyODSSdbe\"><span style=\"font-weight: 400;\">the code here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-5835\"><img loading=\"lazy\" decoding=\"async\" width=\"1640\" height=\"924\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Untitled-design-3.png\" alt=\"Example images from the penguin+turtles dataset\" class=\"wp-image-5835\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Untitled-design-3.png 1640w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Untitled-design-3-300x169.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Untitled-design-3-1024x577.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Untitled-design-3-768x433.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Untitled-design-3-1536x865.png 1536w\" sizes=\"auto, (max-width: 1640px) 100vw, 1640px\" \/><figcaption class=\"wp-element-caption\">Example images from <a href=\"https:\/\/www.kaggle.com\/datasets\/abbymorgan\/penguins-vs-turtles\">our dataset<\/a>; graphic by author<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">Once we\u2019ve downloaded our dataset, we\u2019ll need to define a <\/span><a href=\"https:\/\/pytorch.org\/tutorials\/beginner\/data_loading_tutorial.html\"><span style=\"font-weight: 400;\">custom PyTorch Dataset<\/span><\/a><span style=\"font-weight: 400;\"> class to properly load and preprocess our images before feeding them to our model. We\u2019ll also define a label dictionary to convert our categorical labels into numerical ones. Note that by default, our models treat \u201c0\u201d as the background class. <\/span><\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/anmorgan24\/e82c15d09dcba6a1ddc62919ac4f0f64.js\"><\/script><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Alternatively, could also choose to one-hot encode our labels before logging them to Comet, as demonstrated in <\/span><a href=\"https:\/\/colab.research.google.com\/github\/comet-ml\/comet-examples\/blob\/master\/notebooks\/Comet-Confusion-Matrix.ipynb#scrollTo=NpuVdz95ya_f\"><span style=\"font-weight: 400;\">this example notebook<\/span><\/a><span style=\"font-weight: 400;\">.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Finally, we\u2019ll log our hyperparameters to keep track of which ones produce which results:<\/span><\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/anmorgan24\/e2dbff7053a79dcd3072690efef9422e.js\"><\/script><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Training a Classifier<\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">The best object detection models are trained on tens, if not hundreds, of thousands of labeled images. Our dataset contains a tiny fraction of that, so even if we used image augmentation techniques, we would probably just end up overfitting our model. Thankfully, we can use fine-tuning instead! Fine-tuning allows us take advantage of the weights and biases learned from one task and repurpose them on a new task, saving us time and resources in the process. What\u2019s more, fine-tuning often results in significantly improved performance!<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">We\u2019ll leverage the TorchVision implementation of <\/span><a href=\"https:\/\/pytorch.org\/vision\/main\/models\/faster_rcnn.html\"><span style=\"font-weight: 400;\">FastRCNN<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/pytorch.org\/vision\/main\/models\/mask_rcnn.html\"><span style=\"font-weight: 400;\">MaskRCNN<\/span><\/a><span style=\"font-weight: 400;\"> with <\/span><a href=\"https:\/\/pytorch.org\/vision\/main\/models\/generated\/torchvision.models.resnet50.html\"><span style=\"font-weight: 400;\">ResNet50<\/span><\/a><span style=\"font-weight: 400;\"> backbones.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Logging the Interactive Confusion Matrix<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Basic Usage<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">We can log a confusion matrix to Comet in as little as one line of code using <\/span><a href=\"https:\/\/www.comet.com\/docs\/v2\/api-and-sdk\/python-sdk\/reference\/Experiment\/#experimentlog_confusion_matrix\"><span style=\"font-weight: 400;\">experiment.log_confusion_matrix()<\/span><\/a><span style=\"font-weight: 400;\">. Our goal is to visualize how much our model confuses the categories as it trains, that is, across epochs, so we\u2019ll call this method within our training loop. We can then use the final confusion matrix calculated for each experiment run to compare experiment runs across our project. Lastly, we\u2019ll compare what we can learn from our interactive confusion matrix with the images we log to the Image Panel.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Defining a Callback<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">Alternatively, if we were strictly performing image classification (and not object detection) we could also <\/span><a href=\"https:\/\/colab.research.google.com\/github\/comet-ml\/comet-examples\/blob\/master\/notebooks\/Comet-Confusion-Matrix.ipynb#scrollTo=DnSW8ATtya_l\"><span style=\"font-weight: 400;\">define a callback to log the confusion matrix<\/span><\/a><span style=\"font-weight: 400;\">. This is the preferred method when logging images to a confusion matrix with a lot of categories because it gives you the option to cache images. By using one image for each image set, and then reusing these between epochs, we can dramatically cut training time.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">An example of a confusion matrix callback might look something like this:<\/span><\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/anmorgan24\/221b17e2f07e7c353c2c6cc69e026ce4.js\"><\/script><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">For this simple example, however, we\u2019ll calculate and log a fresh confusion matrix at the end of each epoch. This example will create a series of confusion matrices showing how the model gets less confused as training proceeds.<\/span> <span style=\"font-weight: 400;\">Now that we\u2019ve defined the inputs, we can define and log the confusion matrix itself:<\/span><\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/anmorgan24\/de78592087fe9d86dcba576246338611.js\"><\/script><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Putting it all together<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">We\u2019ll need to create three lists:&nbsp;<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">Ground truth labels (bounding boxes) per epoch<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Predicted labels (bounding boxes) per epoch<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Images overlaid with their respective bounding box predictions per epoch<\/span><\/li>\n<\/ul>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/anmorgan24\/485472cf6fa76efface0810f2f488386.js\"><\/script><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">In our example, we\u2019re also going to log our images to the graphics tab to <a href=\"https:\/\/www.comet.com\/site\/blog\/compare-object-detection-models-from-torchvision\/\">create an image panel<\/a>&nbsp;in our project view. We\u2019ll also log all of our evaluation metrics to a CSV file and log it as a <\/span><a href=\"https:\/\/www.comet.com\/site\/blog\/credit-card-fraud-detection-with-autoencoders\/\"><span style=\"font-weight: 400;\">Data Panel<\/span><\/a><span style=\"font-weight: 400;\">. All together, our training loop will look like this:<\/span><\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/anmorgan24\/4e99a146813b47544dac24f017ae9e77.js\"><\/script><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Using the Confusion Matrix<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">View Multiple Matrices<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">Now we can head over to the Comet UI to take a look at our confusion matrices. Select the experiment you\u2019d like to view, then find the \u2018Confusion Matrix\u2019 tab on the lefthand sidebar. We can add multiple matrices to the same view, or switch between confusion matrices by selecting them from the drop-down menu at the top. By hovering over the different cells of the confusion matrix, you\u2019ll see a quick breakdown of the samples from that cell. If we click on a cell, we can also see specific instances where the model misclassified an image. By default, a maximum of 25 example images is uploaded per cell, but this can be reconfigured with the API.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-5842\"><img loading=\"lazy\" decoding=\"async\" width=\"1412\" height=\"638\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/accessing_conf_matrices-1.gif\" alt=\"GIF showing how to add multiple confusion matrices to your Comet Experiment panel view.\" class=\"wp-image-5842\"\/><figcaption class=\"wp-element-caption\">Adding multiple confusion matrices to your panel view; GIF by author<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">Because we trained our model for three epochs and logged one matrix per epoch, we\u2019ll have three confusion matrices for each experiment run. This will allow us to watch how our models improve over time, while also letting us compare experiment runs across our project. Are there particular images our model tends to struggle with? How can we use this information to augment our training data and improve our model\u2019s performance? <\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">View Specific Instances<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">In the example below, the model seems to get confused by images of white turtles, so maybe we can add some more examples in a future run. In any event, we can see that our model clearly makes fewer mistakes over time, eventually classifying all of the images correctly.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-5840\"><img loading=\"lazy\" decoding=\"async\" width=\"1423\" height=\"527\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-7.29.50-PM.png\" alt=\"Confusion matrices from epochs 0, 1, and 2 of the training process. Note how the model makes fewer mistakes over time, eventually classifying all images correctly\" class=\"wp-image-5840\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-7.29.50-PM.png 1423w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-7.29.50-PM-300x111.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-7.29.50-PM-1024x379.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-7.29.50-PM-768x284.png 768w\" sizes=\"auto, (max-width: 1423px) 100vw, 1423px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.comet.com\/anmorgan24\/interactive-confusion-matrix\/10b779bfccc14ef8b4ff42f0f8f3be92?assetId=f0163e4cd5444efab6b697779d7b9979,5e5b98481ade403486e1ef0ac4d1398a,57737fd929c843eeb233aec573500b17&amp;cellValue=Counts&amp;colorDistribution=Equal&amp;experiment-tab=confusionMatrix&amp;viewId=OrFOvhDzPhD7hW4WycSikjqZa\">Confusion matrices from epochs 0, 1, and 2<\/a> of the training process. Note how the model makes fewer mistakes over time, eventually classifying all images correctly; image by author.<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">We can also click on individual images to examine them more closely. This can be especially helpful in object detection use cases, where visualizing the bounding box location can help us understand where the model is going wrong.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-5839\"><img loading=\"lazy\" decoding=\"async\" width=\"1225\" height=\"716\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/image-misclassification-instances.gif\" alt=\"Examining specific instances of misclassifications in our interactive confusion matrices can reveal patterns that help us to improve performance\" class=\"wp-image-5839\"\/><figcaption class=\"wp-element-caption\">Examining specific instances of misclassifications can reveal patterns that help us to improve performance; GIF by author.<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">When examining specific instances of misclassifications, we can see that the model sometimes categorizes large boulders as turtles, and tends to get confused by one particularly unique breed of penguin. We could choose to augment our training data with images containing similar examples to improve performance.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Aggregating Values<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">We can also choose three different methods of aggregating the cells in our confusion matrices: by count, percent by row, and percent by column. We can further choose either equal or smart color distribution. Equal color distribution divides the range into equal buckets, each with their own color. Smart color distribution ensures that colors are more evenly distributed between cells as the range gets bigger. This second setting can be especially helpful for sparse matrices or matrices with large ranges.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1108\" height=\"361\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/augmenting-conf-mat-values2.gif\" alt=\"Confusion matrices from epochs 0, 1, and 2 of the training process.\" class=\"wp-image-5838\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparing Experiment Runs<\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">The confusion matrix feature also helps us to compare experiment runs across our project. In the example image below, we show the confusion matrices from three different experiments over three epochs. Each series of confusion matrices gives us a very different picture of how each model is behaving.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-5841\"><img loading=\"lazy\" decoding=\"async\" width=\"846\" height=\"720\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-8.44.23-PM.png\" alt=\"Confusion matrices of three different models over the course of three epochs of training. Each series tells a very different story\" class=\"wp-image-5841\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-8.44.23-PM.png 846w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-8.44.23-PM-300x255.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-01-at-8.44.23-PM-768x654.png 768w\" sizes=\"auto, (max-width: 846px) 100vw, 846px\" \/><figcaption class=\"wp-element-caption\">Confusion matrices of three different models over the course of three epochs of training. Each series tells a very different story; image by author.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Thanks for making it all the way to the end, and we hope you found this tutorial useful! Just to recap everything we covered, we:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">Loaded a multi-class image classification dataset;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Fine-tuned a pre-trained TorchVision model with our dataset;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Logged confusion matrices with image examples per epoch, per model;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Arranged our confusion matrix view and aggregated the values;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Compared our confusion matrices over time and across multiple experiment runs;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Used the interactive confusion matrix to debug our image classification model and examine individual instances.<\/span><\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Try out the code in this tutorial here with your own dataset or model! You can view <\/span><a href=\"https:\/\/www.comet.com\/anmorgan24\/interactive-confusion-matrix\/view\/1oobhPiRdETUwTnWr9AGhOK0b\/panels\"><span style=\"font-weight: 400;\">the public project here<\/span><\/a><span style=\"font-weight: 400;\"> or, to get started with your own project, <\/span><a href=\"\/signup?utm_content=confusion_matrix_with_images_blog\"><span style=\"font-weight: 400;\">create an account here for free<\/span><\/a><span style=\"font-weight: 400;\">!<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Additional Resources<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/compare-object-detection-models-from-torchvision\/\">Compare and Evaluate Object Detection Models From Torchvision<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/\">Debugging Classifiers With Confusion Matrices<\/a>&nbsp;(for imbalanced datasets)<\/li>\n\n\n\n<li><a href=\"https:\/\/comet.com\/docs\/v2\/api-and-sdk\/python-sdk\/reference\/Experiment\/#experimentlog_confusion_matrix\">Comet&#8217;s Confusion Matrix Documentation<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/introducing-comets-new-image-panel\/\">Introducing Comet&#8217;s New Image Panel<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Introduction We often rely on scalar metrics and static plots to describe and evaluate machine learning models, but these methods rarely capture the full story. Especially when dealing with computer vision tasks like classification, detection, segmentation, and generation, visualizing your outputs is essential to understanding how your model is behaving and why.&nbsp; We may notice [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":6987,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[8,6,9,7],"tags":[40,29,30,35,41,37,38,39],"coauthors":[133],"class_list":["post-5826","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-machine-learning","category-product","category-tutorials","tag-comet","tag-computer-vision","tag-deep-learning","tag-image-classification","tag-interactive-confusion-matrix","tag-object-detection","tag-pytorch","tag-torchvision"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Debugging Image Classifiers With Confusion Matrices<\/title>\n<meta name=\"description\" content=\"Debug image classification models with Comet&#039;s Interactive Confusion Matrix, which allows you to view specific instances of misclassification.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Debugging Image Classifiers With Confusion Matrices\" \/>\n<meta property=\"og:description\" content=\"Debug image classification models with Comet&#039;s Interactive Confusion Matrix, which allows you to view specific instances of misclassification.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-05-13T16:14:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-29T14:05:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screen-Shot-2023-07-28-at-12.01.07-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"304\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Abby Morgan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@anmorgan2414\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Abby Morgan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Debugging Image Classifiers With Confusion Matrices","description":"Debug image classification models with Comet's Interactive Confusion Matrix, which allows you to view specific instances of misclassification.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/","og_locale":"en_US","og_type":"article","og_title":"Debugging Image Classifiers With Confusion Matrices","og_description":"Debug image classification models with Comet's Interactive Confusion Matrix, which allows you to view specific instances of misclassification.","og_url":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-05-13T16:14:34+00:00","article_modified_time":"2025-04-29T14:05:22+00:00","og_image":[{"width":300,"height":304,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screen-Shot-2023-07-28-at-12.01.07-PM.png","type":"image\/png"}],"author":"Abby Morgan","twitter_card":"summary_large_image","twitter_creator":"@anmorgan2414","twitter_site":"@Cometml","twitter_misc":{"Written by":"Abby Morgan","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/"},"author":{"name":"Abby Morgan","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/826ee39a2e30cf9d8d73155de09bb7b2"},"headline":"Debugging Image Classifiers With Confusion Matrices","datePublished":"2023-05-13T16:14:34+00:00","dateModified":"2025-04-29T14:05:22+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/"},"wordCount":1764,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screen-Shot-2023-07-28-at-12.01.07-PM.png","keywords":["Comet","Computer Vision","Deep Learning","Image Classification","Interactive Confusion Matrix","Object Detection","PyTorch","TorchVision"],"articleSection":["Comet Community Hub","Machine Learning","Product","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/","url":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/","name":"Debugging Image Classifiers With Confusion Matrices","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screen-Shot-2023-07-28-at-12.01.07-PM.png","datePublished":"2023-05-13T16:14:34+00:00","dateModified":"2025-04-29T14:05:22+00:00","description":"Debug image classification models with Comet's Interactive Confusion Matrix, which allows you to view specific instances of misclassification.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screen-Shot-2023-07-28-at-12.01.07-PM.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/05\/Screen-Shot-2023-07-28-at-12.01.07-PM.png","width":300,"height":304,"caption":"9 different images of sea creatures, such as turtles and penguins"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-image-classifiers-with-confusion-matrices\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Debugging Image Classifiers With Confusion Matrices"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/826ee39a2e30cf9d8d73155de09bb7b2","name":"Abby Morgan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/dbbf1ae921ee179c768f508340415946","url":"https:\/\/secure.gravatar.com\/avatar\/28d4934d14261b4afe12e226f0eaa57c4fb0c2761ad4586eb9a5bec3b8160bc9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28d4934d14261b4afe12e226f0eaa57c4fb0c2761ad4586eb9a5bec3b8160bc9?s=96&d=mm&r=g","caption":"Abby Morgan"},"description":"AI\/ML Growth Engineer @ Comet","sameAs":["https:\/\/www.comet.com\/","https:\/\/www.linkedin.com\/in\/anmorgan24\/","https:\/\/x.com\/anmorgan2414"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/abigailmcomet-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5826","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=5826"}],"version-history":[{"count":2,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5826\/revisions"}],"predecessor-version":[{"id":15814,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5826\/revisions\/15814"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/6987"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=5826"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=5826"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=5826"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=5826"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}