{"id":2198,"date":"2020-09-01T14:01:10","date_gmt":"2020-09-01T22:01:10","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/blog\/debugging-classifiers-with-confusion-matrices\/"},"modified":"2025-04-24T17:30:40","modified_gmt":"2025-04-24T17:30:40","slug":"debugging-classifiers-with-confusion-matrices","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/","title":{"rendered":"Debugging Classifiers with Confusion Matrices"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Introduction<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">A confusion matrix is a visual way to inspect the performance of a classification model. Metrics such as accuracy can be inadequate in cases where there are large class imbalances in the data, a problem common in machine learning applications for fraud detection. A confusion matrix can provide us with a more representative view of our classifier\u2019s performance, including which specific instances it is having trouble classifying.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In this post we are going to illustrate two ways in which Comet\u2019s confusion matrix can help debug classification models.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">For our first example, we will run an experiment similar to the one illustrated in this post on <\/span><a href=\"https:\/\/www.tensorflow.org\/tutorials\/structured_data\/imbalanced_data\"><span style=\"font-weight: 400;\">imbalanced data<\/span><\/a><span style=\"font-weight: 400;\">. We\u2019re going to train a classifier to detect fraudulent transactions in an imbalanced dataset and use Comet\u2019s confusion matrix to evaluate our model\u2019s performance.&nbsp;<\/span><span style=\"font-weight: 400;\">In our second example we will cover classification on unstructured data with a large number of labels using the CIFAR100 dataset and a simple CNN model.&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Confusion Matrices with Imbalanced Data<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In this example, we\u2019re going to use the <\/span><a href=\"https:\/\/www.kaggle.com\/mlg-ulb\/creditcardfraud\"><span style=\"font-weight: 400;\">Credit Card Fraud Detection<\/span><\/a><span style=\"font-weight: 400;\"> dataset from Kaggle to evaluate our classifier. This dataset is highly imbalanced, with only 492 fraudulent transactions present in a dataset with <\/span><span style=\"font-weight: 400;\">284,807 transactions in total. <\/span><span style=\"font-weight: 400;\">Our model is a single fully connected layer with Dropout enabled. We\u2019re going to train our model using the Adam optimizer for 5 epochs, with a batch size of 64 and use 20% of our dataset for validation.<\/span><\/p>\n\n\n\n<div>\n<table>\n<tbody>\n<tr>\n<td>\n<pre>def load_data():\n    raw_df = pd.read_csv(\n        \"https:\/\/storage.googleapis.com\/download.tensorflow.org\/data\/creditcard.csv\"\n    )\n\n    return raw_df\n\ndef preprocess(raw_df):\n    df = raw_df.copy()\n\n    eps = 0.01\n\n    df.pop(\"Time\")\n    df[\"Log Ammount\"] = np.log(df.pop(\"Amount\") + eps)\n\n    train_df, val_df = train_test_split(df, test_size=0.2)\n\n    train_labels = np.array(train_df.pop(\"Class\"))\n    val_labels = np.array(val_df.pop(\"Class\"))\n\n    train_features = np.array(train_df)\n    val_features = np.array(val_df)\n\n    scaler = StandardScaler()\n    train_features = scaler.fit_transform(train_features)\n    val_features = scaler.transform(val_features)\n\n    train_features = np.clip(train_features, -5, 5)\n    val_features = np.clip(val_features, -5, 5)\n\n    return train_features, val_features, train_labels, val_labels\n\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Training a classifier<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Our model is a single fully connected layer with Dropout enabled. We\u2019re going to train our model using the Adam optimizer for 5 epochs, with a batch size of 64 and use 20% of our dataset for validation.<\/span><\/p>\n\n\n\n<div>\n<table>\n<tbody>\n<tr>\n<td>\n<pre>def build_model(input_shape, output_bias=None):\n    if output_bias is not None:\n        output_bias = tf.keras.initializers.Constant(output_bias)\n    model = keras.Sequential(\n        [\n            keras.layers.Dense(16, activation=\"relu\", input_shape=(input_shape,)),\n            keras.layers.Dropout(0.5),\n            keras.layers.Dense(1, activation=\"sigmoid\", bias_initializer=output_bias),\n        ]\n    )\n\n    model.compile(\n        optimizer=keras.optimizers.Adam(lr=1e-3),\n        loss=keras.losses.BinaryCrossentropy(),\n        metrics=[\"accuracy\"],\n    )\n\n    return model\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Since we\u2019re using Keras as our modelling framework, Comet will automatically log our hyperparameters, and training metrics (accuracy and loss) to the web UI. At the end of every epoch we\u2019re going to log a confusion matrix of the model\u2019s predictions on our validation dataset using a custom Keras callback and Comet\u2019s `log_confusion_matrix` function.&nbsp;<\/span><\/p>\n\n\n\n<div>\n<table>\n<tbody>\n<tr>\n<td>\n<pre>class ConfusionMatrixCallback(Callback):\n    def __init__(self, experiment, inputs, targets, cutoff=0.5):\n        self.experiment = experiment\n        self.inputs = inputs\n        self.cutoff = cutoff\n        self.targets = targets\n        self.targets_reshaped = keras.utils.to_categorical(self.targets)\n\n    def on_epoch_end(self, epoch, logs={}):\n        predicted = self.model.predict(self.inputs)\n        predicted = np.where(predicted &lt; self.cutoff, 0, 1)\n\n        predicted_reshaped = keras.utils.to_categorical(predicted)\n        self.experiment.log_confusion_matrix(\n            self.targets_reshaped,\n            predicted_reshaped,\n            title=\"Confusion Matrix, Epoch #%d\" % (epoch + 1),\n            file_name=\"confusion-matrix-%03d.json\" % (epoch + 1),\n        )\n\ndef main():\n    experiment = Experiment(workspace=WORKSPACE, project_name=PROJECT_NAME)\n\n    df = load_data()\n    X_train, X_val, y_train, y_val = preprocess(df)\n\n    confmat = ConfusionMatrixCallback(experiment, X_val, y_val)\n    model = build_model(input_shape=X_train.shape[1])\n    model.fit(\n        X_train,\n        y_train,\n        validation_data=(X_val, y_val),\n        epochs=5,\n        batch_size=64,\n        callbacks=[confmat],\n    )\n\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Results<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We see that our model is able to achieve a very high validation accuracy after a single epoch of training. This is misleading, since only 0.17% of the dataset has a positive label. We can achieve over 99% accuracy on this dataset by simply predicting a 0 label for any given transaction.&nbsp;<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2753\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/accuracyval_accuracy.jpg\" alt=\"\" class=\"wp-image-2753\"\/><figcaption class=\"wp-element-caption\">Figure 1. Logged Classification Metrics (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=chart&amp;showOutliers=true&amp;smoothing=0&amp;view=val-acc&amp;xAxis=step\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Let\u2019s take a look at our Confusion Matrix to see what our real performance is like.&nbsp;<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2754\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/confusion-matix-step-1.png\" alt=\"\" class=\"wp-image-2754\"\/><figcaption class=\"wp-element-caption\">Figure 2. Comet Confusion Matrix (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=confusionMatrix\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">For our binary classification task, we see that after a single epoch of training our model produces 5 false positive and 29 false negative predictions.&nbsp; If our classifier was perfect, these values would be 0.&nbsp; <\/span><span style=\"font-weight: 400;\">By clicking on the cell with the false negatives, we can see the indices in our validation dataset that were incorrectly classified as not fraudulent.&nbsp;<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=confusionMatrix\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/confusion-matrix-step-2-1024x561-1.png\" alt=\"\" class=\"wp-image-2756\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 3. Misclassified Examples (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=chart&amp;showOutliers=true&amp;smoothing=0&amp;view=val-acc&amp;xAxis=step\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We can also see if our predictions improved over time by changing the epoch number in the dropdown selector.&nbsp;&nbsp;<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2757\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-20-at-1.08.24-PM-1024x562-1.png\" alt=\"\" class=\"wp-image-2757\"\/><figcaption class=\"wp-element-caption\">Figure 4. Switching to most recently logged Confusion Matrix (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=chart&amp;showOutliers=true&amp;smoothing=0&amp;view=val-acc&amp;xAxis=step\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Lastly, in order to get an estimate of the per class performance, we can simply hover over the cell corresponding to that class. In Figure 5, we see that our classifier&#8217;s accuracy when it comes to detecting a true fraudulent transaction is closer to 83% rather than the reported validation accuracy of 99%.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2759\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-20-at-1.19.51-PM-1024x562-1.png\" alt=\"\" class=\"wp-image-2759\"\/><figcaption class=\"wp-element-caption\">Figure 5. Visualizing Per Class Accuracy (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=chart&amp;showOutliers=true&amp;smoothing=0&amp;view=val-acc&amp;xAxis=step\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Comet Confusion Matrix with Unstructured Data<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Comet makes it easy to deal with classification problems that depend on unstructured data. We\u2019re going to use the CIFAR100 dataset and a simple CNN to illustrate how the ConfusionMatrix callback is used for these types of data.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">First, let\u2019s fetch our dataset, and preprocess it using the built in convenience methods in Keras<\/span><\/p>\n\n\n\n<div>\n<table>\n<tbody>\n<tr>\n<td>\n<pre># Load CIFAR-100 data\n(input_train, target_train), (input_test, target_test) = cifar100.load_data()# Parse numbers as floats\ninput_train = input_train.astype(\"float32\")\ninput_test = input_test.astype(\"float32\")\n\n# Normalize data\ninput_train = input_train \/ 255\ninput_test = input_test \/ 255\n\ntarget_train, target_test = tuple(\nmap(lambda x: keras.utils.to_categorical(x), [target_train, target_test])\n)\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Next, we\u2019ll define our CNN architecture for this task as well as our training parameters, such as batch size and number of epochs.&nbsp;<\/span><\/p>\n\n\n\n<div>\n<table>\n<tbody>\n<tr>\n<td>\n<pre># Model configuration\nbatch_size = 128\nimg_width, img_height, img_num_channels = 32, 32, 3\nloss_function = categorical_crossentropy\nno_classes = 100\nno_epochs = 100\noptimizer = Adam()\nverbosity = 1\nvalidation_split = 0.2\ninterval = 10\n\n# Build the Model\nmodel = Sequential()\nmodel.add(Conv2D(128, kernel_size=(3, 3), activation=\"relu\", input_shape=input_shape))\nmodel.add(MaxPooling2D(pool_size=(2, 2)))\nmodel.add(Conv2D(128, kernel_size=(3, 3), activation=\"relu\"))\nmodel.add(MaxPooling2D(pool_size=(2, 2)))\nmodel.add(Conv2D(64, kernel_size=(3, 3), activation=\"relu\"))\nmodel.add(MaxPooling2D(pool_size=(2, 2)))\nmodel.add(Flatten())\nmodel.add(Dense(256, activation=\"relu\"))\nmodel.add(Dense(128, activation=\"relu\"))\nmodel.add(Dense(no_classes, activation=\"softmax\"))\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Finally, we\u2019ll update our Confusion Matrix callback from the previous example so that we use a single Confusion Matrix object for the entire training process. This updated callback will only create a confusion matrix every <strong>Nth<\/strong> epoch, where <strong>N<\/strong> is controlled by the <\/span><b>interval <\/b><span style=\"font-weight: 400;\">parameter.&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<div>\n<table>\n<tbody>\n<tr>\n<td>\n<pre>class ConfusionMatrixCallback(keras.callbacks.Callback):\n    def __init__(self, experiment, inputs, targets, interval):\n        self.experiment = experiment\n        self.inputs = inputs\n        self.targets = targets\n        self.interval = interval\n\n        self.confusion_matrix = ConfusionMatrix(\n            index_to_example_function=self.index_to_example,\n            max_examples_per_cell=5,\n            labels=LABELS,\n        )\n\n    def index_to_example(self, index):\n        image_array = self.inputs[index]\n        image_name = \"confusion-matrix-%05d.png\" % index\n        results = experiment.log_image(image_array, name=image_name)\n        # Return sample, assetId (index is added automatically)\n        return {\"sample\": image_name, \"assetId\": results[\"imageId\"]}\n\n    def on_epoch_end(self, epoch, logs={}):\n        if (epoch + 1) % self.interval != 0:\n            return\n        predicted = self.model.predict(self.inputs)\n        self.confusion_matrix.compute_matrix(self.targets, predicted)\n        self.experiment.log_confusion_matrix(\n            matrix=self.confusion_matrix,\n            title=\"Confusion Matrix, Epoch #%d\" % (epoch + 1),\n            file_name=\"confusion-matrix-%03d.json\" % (epoch + 1),\n        )\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Using a single instance of the Confusion Matrix ensures that the fewest number of images are logged to Comet as it reuses them wherever possible over all epochs of training. <\/span><span style=\"font-weight: 400;\">These uploaded examples will be available inside our Confusion Matrix in the UI, and allow us to easily view the specific instances that our model is having difficulty classifying.&nbsp;&nbsp;&nbsp;<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=confusionMatrix\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.40.35-PM-1024x562-1.png\" alt=\"\" class=\"wp-image-2762\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 6. Confusion Matrix for CIFAR100 classes (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=confusionMatrix\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In Figure 6 we see that the Comet confusion matrix has trimmed down the total number of classes to the ones that the model is most confused about. i.e. the labels with the most misclassifications. By clicking on a cell, we can see examples of instances that have been misclassified.&nbsp;<\/span><span style=\"font-weight: 400;\">We can also change the what values are displayed in each cell of the matrix by changing the cell value to show the percent of correct predictions by row or column.&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2763\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.22.41-PM-1024x562-1.png\" alt=\"\" class=\"wp-image-2763\"\/><figcaption class=\"wp-element-caption\">Figure 8. Change the values displayed in the cells (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=confusionMatrix\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2764\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.41.46-PM-1024x545-1.png\" alt=\"\" class=\"wp-image-2764\"\/><figcaption class=\"wp-element-caption\">Figure 9. Cell values showing the percent of correct predictions by row (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=confusionMatrix\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2765\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.22.58-PM-1024x561-1.png\" alt=\"\" class=\"wp-image-2765\"\/><figcaption class=\"wp-element-caption\">Figure 10. Cell values showing the percent of correct predictions by column (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=confusionMatrix\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-2766\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-1.png\" alt=\"\" class=\"wp-image-2766\"\/><figcaption class=\"wp-element-caption\">Figure 11. CIFAR100 Misclassified examples (<a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=confusionMatrix\">link to experiment<\/a>)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In this example, Comet\u2019s Confusion Matrix will upload the image examples as assets to the experiment. Alternatively, if your images are hosted somewhere else, or you are using assets such as audio, Comet\u2019s confusion matrix can map the index of the classified example to the url of the corresponding asset.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The Confusion Matrix API also allows the user to control the number of assets uploaded to Comet. We can either specify the maximum number of examples to be uploaded per cell, or provide a list of the classes that we are interested in comparing to the <\/span><b>selected<\/b><span style=\"font-weight: 400;\"> argument in the Confusion Matrix constructor.&nbsp;&nbsp;&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">&nbsp;<\/h2>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Conclusion<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In this post, we\u2019ve gone through an example of a classification task where our target data is highly imbalanced. We\u2019ve shown how a metric like accuracy cannot accurately capture the true performance of our model and why visual tools like the confusion matrix can help us get a more granular understanding of our model performance across different classes. <\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">You can also explore the experiment on imbalanced data in more detail: <\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\"><strong>Comet Experiment for Imbalanced Data:<\/strong> Get access to the code used to generate these results on <\/span><a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/e7402f4bb4fe41a08b71159ec68e5212?experiment-tab=chart&amp;showOutliers=true&amp;smoothing=0&amp;view=val-acc&amp;xAxis=step\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\"><strong>Colab Notebook for Imbalanced Data:<\/strong>&nbsp; <span style=\"font-weight: 400;\">If you would like to run the code yourself, you can test it out in a Colab Notebook <a href=\"https:\/\/colab.research.google.com\/drive\/1QGdE2u5ZWhY_QkcxI2PSNHzU8pSXAIoh?usp=sharing\">here<\/a>. Keep in mind that you will need a Comet account and a Comet API Key to run the&nbsp; notebook.&nbsp;<\/span><\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We also demonstrated how Comet\u2019s confusion matrix can be configured to work with unstructured data, and how it can provide examples of misclassified labels for easy model debugging. <\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\"><strong>Comet Experiment for CIFAR100: <\/strong>All the code necessary to reproduce these results can be found <\/span><a href=\"https:\/\/www.comet.com\/cometpublic\/confusion-matrix\/a88674d019c744b59ec4fa2fc8b525e3?experiment-tab=chart&amp;showOutliers=true&amp;smoothing=0&amp;view=val-acc&amp;xAxis=step\">here<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\"><strong>Colab Notebook for CIFAR100:<\/strong> <a href=\"https:\/\/colab.research.google.com\/drive\/1N5025mWmiJGWImx1KD0RtSyoGaahlIZT?usp=sharing\">here<\/a>. We would recommend enabling the GPU on the notebook before running it. You can do this by navigating to Edit\u2192Notebook Settings, and selecting GPU from the Hardware Accelerator drop-down.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>*Note:*<\/strong> Comet&#8217;s Confusion Matrix also supports R. Check out the example <a href=\"https:\/\/nbviewer.jupyter.org\/github\/comet-ml\/comet-examples\/blob\/master\/notebooks\/Comet-R-nnet.ipynb\">here<\/a><span style=\"font-weight: 400;\">&nbsp;<\/span><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><em>Want to stay in the loop?&nbsp;<a href=\"https:\/\/info.comet.ml\/newsletter-signup\/?utm_campaign=tensorboard-integration&amp;utm_source=blog&amp;utm_medium=CTA\">Subscribe to the Comet Newsletter<\/a>&nbsp;for weekly insights and perspective on the latest ML news, projects, and more.<\/em><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>A confusion matrix can provide us with a more representative view of our classifier\u2019s performance, including which specific instances it is having trouble classifying.\u00a0<\/p>\n","protected":false},"author":1,"featured_media":2209,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[8,7],"tags":[],"coauthors":[128],"class_list":["post-2198","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Debugging Classifiers with Confusion Matrices - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Debugging Classifiers with Confusion Matrices\" \/>\n<meta property=\"og:description\" content=\"A confusion matrix can provide us with a more representative view of our classifier\u2019s performance, including which specific instances it is having trouble classifying.\u00a0\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2020-09-01T22:01:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:30:40+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"538\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Dhruv Nair\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dhruv Nair\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Debugging Classifiers with Confusion Matrices - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/","og_locale":"en_US","og_type":"article","og_title":"Debugging Classifiers with Confusion Matrices","og_description":"A confusion matrix can provide us with a more representative view of our classifier\u2019s performance, including which specific instances it is having trouble classifying.\u00a0","og_url":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2020-09-01T22:01:10+00:00","article_modified_time":"2025-04-24T17:30:40+00:00","og_image":[{"width":1024,"height":538,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png","type":"image\/png"}],"author":"Dhruv Nair","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Dhruv Nair","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/"},"author":{"name":"engineering@atre.net","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b"},"headline":"Debugging Classifiers with Confusion Matrices","datePublished":"2020-09-01T22:01:10+00:00","dateModified":"2025-04-24T17:30:40+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/"},"wordCount":1303,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png","articleSection":["Comet Community Hub","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/","url":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/","name":"Debugging Classifiers with Confusion Matrices - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png","datePublished":"2020-09-01T22:01:10+00:00","dateModified":"2025-04-24T17:30:40+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png","width":1024,"height":538,"caption":"CIFAR100 Misclassified examples"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/debugging-classifiers-with-confusion-matrices\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Debugging Classifiers with Confusion Matrices"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b","name":"engineering@atre.net","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/027c18177377edf459980f0cfb83706c","url":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","caption":"engineering@atre.net"},"sameAs":["https:\/\/live-cometml.pantheonsite.io"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/engineeringatre-net\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/Screen-Shot-2020-07-22-at-12.42.57-PM-1024x538-2.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=2198"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2198\/revisions"}],"predecessor-version":[{"id":15694,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2198\/revisions\/15694"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/2209"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=2198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=2198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=2198"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=2198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}