{"id":8283,"date":"2023-11-30T07:06:18","date_gmt":"2023-11-30T15:06:18","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8283"},"modified":"2025-11-17T20:45:43","modified_gmt":"2025-11-17T20:45:43","slug":"an-intuitive-guide-to-convolutional-neural-networks","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/","title":{"rendered":"An Intuitive Guide to Convolutional Neural\u00a0Networks"},"content":{"rendered":"\n<section class=\"section section--body\">\n<div class=\"section-divider\"><\/div>\n<div class=\"section-content\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk\" alt=\"An Intuitive Guide to Convolutional Neural&nbsp;Networks with a focus of ResNet and DenseNet and Comet\" width=\"1600\" height=\"1067\" data-image-id=\"0*xNcoatSR3XQ4QSsk\" data-width=\"6000\" data-height=\"4000\" data-unsplash-photo-id=\"QRawWgV6gmo\" data-is-featured=\"true\"><\/figure><div class=\"section-inner sectionLayout--insetColumn\">\n<h2 class=\"graf graf--h4\">With a Focus on ResNet and DenseNet<\/h2>\n<figure class=\"graf graf--figure\"><figcaption class=\"imageCaption\">Photo by <a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/unsplash.com\/@ionfet?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"photo-creator noopener\" data-href=\"https:\/\/unsplash.com\/@ionfet?utm_source=medium&amp;utm_medium=referral\">Ion Fet<\/a> on&nbsp;<a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"photo-source noopener\" data-href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p class=\"graf graf--p\">This comprehensive guide aims to demystify CNNs, providing insights into their structure, functionality, and why they are so effective for image-related tasks.<\/p>\n<p class=\"graf graf--p\">We delve into the intricacies of Residual Networks (ResNet), a groundbreaking architecture in CNNs. Understanding why ResNet is essential, its innovative aspects, and what it enables in deep learning forms a crucial part of our exploration. We\u2019ll also discuss the optimal scenarios for deploying ResNet and examine its common architectural variations.<\/p>\n<p class=\"graf graf--p\">Transitioning from ResNet, we introduce DenseNet\u200a\u2014\u200aanother influential CNN architecture. A comparative analysis of DenseNet and ResNet highlights their unique features, benefits, and limitations. This comparison is academic and practical, offering deep learning practitioners insights into choosing the right architecture based on specific project needs.<\/p>\n<p class=\"graf graf--p\">This blog aims to equip you with a thorough understanding of these powerful neural network architectures. Whether you\u2019re a seasoned AI researcher or a budding enthusiast in machine learning, the insights offered here will deepen your understanding and guide you in leveraging the full potential of CNNs in various applications.<\/p>\n<h3 class=\"graf graf--h3\">What you\u2019ll&nbsp;learn<\/h3>\n<ul class=\"postList\">\n<li class=\"graf graf--li\">An introduction to convolutional neural networks (CNNs)<\/li>\n<li class=\"graf graf--li\">Why we need ResNet<\/li>\n<li class=\"graf graf--li\">The novelty of ResNet<\/li>\n<li class=\"graf graf--li\">What ResNet allows you to do<\/li>\n<li class=\"graf graf--li\">Where ResNet works best<\/li>\n<li class=\"graf graf--li\">Common ResNet architectures<\/li>\n<li class=\"graf graf--li\">Introduction to DenseNet<\/li>\n<li class=\"graf graf--li\">DenseNet vs ResNet<\/li>\n<li class=\"graf graf--li\">Advantages and disadvantages of these differences<\/li>\n<\/ul>\n<h3 class=\"graf graf--h3\"><strong class=\"markup--strong markup--h3-strong\">An <\/strong>Introduction to Convolutional Neural Networks&nbsp;(CNNs)<\/h3>\n<p class=\"graf graf--p\">A Convolutional Neural Network model architecture works exceptionally well with image data.<\/p>\n<p class=\"graf graf--p\">In a typical neural network, you flatten your input one vector, take those input values in at once, multiply them by the weights in the first layer, add the bias, and pass the result into a neuron. You then repeat that loop for each layer in your network. But because you\u2019re passing individual pixel values through the network, how the network learns becomes very specific.<\/p>\n<figure class=\"graf graf--figure\">\n<\/figure><\/div><\/div><\/section>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*jybnqkpTZve16nmy\" alt=\"An Intuitive Guide to Convolutional Neural\u00a0Networks with a focus of ResNet and DenseNet and Cometandrew jones data science infinity\"\/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.linkedin.com\/in\/andrew-jones-dsi\/\">Andrew Jones<\/a> of <a href=\"https:\/\/www.data-science-infinity.com\/\">Data Science Infinity<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">Imagine that you train a network to recognize pictures of a statue. If you trained this network on pictures where statues are near the left of the images and later try to generalize the network to pictures where statues are in the middle or to the right of an image, chances are it won\u2019t recognize that there\u2019s a statue there. And that\u2019s because a vanilla neural network is not translationally invariant.<\/p>\n\n\n\n<p class=\"graf graf--p\">Things are different for CNNs.<\/p>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<blockquote class=\"graf graf--pullquote\"><p>Want to learn how to build modern software with LLMs using the newest tools and techniques in the field? <a class=\"markup--anchor markup--pullquote-anchor\" href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\" target=\"_blank\" rel=\"noopener ugc nofollow\" data-href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\">Check out this free LLMOps course<\/a> from industry expert Elvis Saravia of&nbsp;DAIR.AI!<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/section>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<h3 class=\"graf graf--h3\"><strong class=\"markup--strong markup--h3-strong\">Translational Invariance<\/strong><\/h3>\n<p class=\"graf graf--p\">What makes CNNs so powerful?<\/p>\n<p class=\"graf graf--p\">Their power lies in the ability of the network to achieve the property of translational invariance. Translational invariance is important because you\u2019re more interested in the presence of a feature rather than where it\u2019s located.<\/p>\n<p class=\"graf graf--p\">Once a CNN is trained to detect things in an image, changing the position of that thing in an image won\u2019t prevent the CNN\u2019s ability to detect it.<\/p>\n<figure class=\"graf graf--figure\">\n<\/figure><\/div><\/div><\/section>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*LE1QRxBkvZ2qVsZ81yP3YA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/stats.stackexchange.com\/questions\/208936\/what-is-translation-invariance-in-computer-vision-and-convolutional-neural-netwo\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-anatomy-of-a-nbsp-cnn\"><strong class=\"markup--strong markup--h3-strong\">Anatomy of a&nbsp;CNN<\/strong><\/h3>\n\n\n\n<p class=\"graf graf--p\">Let\u2019s outline the architectural anatomy of a convolutional neural network:<\/p>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li>Convolutional layers<\/li>\n\n\n\n<li>Activation layers<\/li>\n\n\n\n<li>Pooling layers<\/li>\n\n\n\n<li>Dense layers<\/li>\n<\/ul>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*_tcxWCzTdLZoqxfq\" alt=\"Andrew Jones Data Science Infinity\"\/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.linkedin.com\/in\/andrew-jones-dsi\/\">Andrew Jones<\/a> of <a href=\"https:\/\/www.data-science-infinity.com\/\">Data Science&nbsp;Infinity<\/a><\/figcaption><\/figure>\n\n\n\n<figcaption class=\"imageCaption\"><\/figcaption>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-convolutional-layer\"><strong class=\"markup--strong markup--h4-strong\">Convolutional Layer<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">Instead of flattening the input at the input layer, you start by applying a filter.<\/p>\n\n\n\n<p class=\"graf graf--p\">Think of the filter as a \u201cwindow\u201d that you slide over small sections of an image from right to left, top to bottom, repeated over the entire image. In every one of these filters, we apply a mathematical function called a convolution. The convolution is a dot product that multiplies the different input values in that filter by some weights, adds those values up, and outputs one unique value for that window.<\/p>\n\n\n\n<p class=\"graf graf--p\">This process allows us to move away from individual pixels and into groups of pixels that help the network learn useful features.<\/p>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EnDVJW7pM4gGlMfR\" alt=\"Andrew Jones Data Science Infinity\"\/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.linkedin.com\/in\/andrew-jones-dsi\/\">Andrew Jones<\/a> of <a href=\"https:\/\/www.data-science-infinity.com\/\">Data Science Infinity<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">This operation is repeated for every section of the image that a filter strides over.<\/p>\n\n\n\n<p class=\"graf graf--p\">Because the filter is typically smaller than the whole input image, the same weights can be applied as the filter strides over the entire image. It turns out that applying the same filter to the whole image helps the network discover important features in your image. This is because <em class=\"markup--em markup--p-em\">each dot product gives us some notion of similarity<\/em> since pixels in an image usually have stronger relationships with surrounding pixels than with pixels further away.<\/p>\n\n\n\n<p class=\"graf graf--p\">After repeating this process several times, you end up with a compressed version of your data called a<strong class=\"markup--strong markup--p-strong\"> feature map<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-activation-layer\"><strong class=\"markup--strong markup--h4-strong\">Activation Layer<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">The activation layer takes the resulting feature maps and applies a non-linear activation function\u200a\u2014\u200atypically ReLU.<\/p>\n\n\n\n<p class=\"graf graf--p\">No \u201clearning\u201d happens in this layer, but it is still an essential component of a CNN architecture.<\/p>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*rAFT7RtGOnja-vL0\" alt=\"Andrew Jones Data Science Infinity\"\/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.linkedin.com\/in\/andrew-jones-dsi\/\">Andrew Jones<\/a> of <a href=\"https:\/\/www.data-science-infinity.com\/\">Data Science Infinity<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">Once we have a feature map passed to an activation function, we can proceed to the pooling layer.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-pooling-layer\"><strong class=\"markup--strong markup--h4-strong\">Pooling Layer<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">The pooling layer helps <em class=\"markup--em markup--p-em\">reduce the size of your problem <\/em>space; it is essentially a dimensionality reduction.<\/p>\n\n\n\n<p class=\"graf graf--p\">It works by taking a grid of pixels and reducing them to a single value for future layers to receive as input. In the example below, for each 2&#215;2 grid of pixels, the pixel with the maximum value is kept. This is called <strong class=\"markup--strong markup--p-strong\">max pooling<\/strong> (if you wanted to you could do the average instead).<\/p>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xKblW3n6oxujZv98\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.linkedin.com\/in\/andrew-jones-dsi\/\">Andrew Jones<\/a> of <a href=\"https:\/\/www.data-science-infinity.com\/\">Data Science Infinity<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\"><em class=\"markup--em markup--p-em\">Pooling layers help control overfitting <\/em>because they reduce the number of parameters and computations in the network. Since this process takes only one value of a large set of inputs and outputs, it makes it harder for the network to memorize the input. Forcing the network to learn the most important features at a more general level.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-dense-layers\"><strong class=\"markup--strong markup--h4-strong\">Dense Layers<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*gK2KT2mp3EA_nvPX\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/miro.medium.com\/max\/1200\/0*_0rJKSMmZdkMl2fy\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/miro.medium.com\/max\/1200\/0*_0rJKSMmZdkMl2fy\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">By the end of this cycle of convolution, activation, and pooling layers, we flatten our output and pass it to a series of fully-connected or dense layers. The dense layers of the CNN take an input vector of the flattened pixels of the image. By this point, the image has been filtered, corrected and reduced by convolution and pooling layers.<\/p>\n\n\n\n<p class=\"graf graf--p\">You can then apply a SoftMax function at the output of the dense layers to provide the probability that the image belongs to a certain class.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-why-deep-networks-don-t-always-work-nbsp-well\"><strong class=\"markup--strong markup--h3-strong\">Why Deep Networks Don\u2019t Always Work&nbsp;Well<\/strong><\/h3>\n\n\n\n<p class=\"graf graf--p\">CNNs typically have several iterations of convolution and pooling layers, with some architectures stacking dozens and dozens of layers.<\/p>\n\n\n\n<p class=\"graf graf--p\">It turns out, though, that <em class=\"markup--em markup--p-em\">learning better networks is not as easy as stacking more and more layers<\/em>. As you increase the depth of a network, accuracy increases to a saturation point and then begins to degrade as networks become deeper and deeper. You\u2019re faced with issues such as <strong class=\"markup--strong markup--p-strong\">vanishing and exploding gradients<\/strong>, degradation not caused by overfitting, and increasing training errors.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*gtqVETiavBEtfaju.gif\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/miro.medium.com\/max\/960\/1*Ku54qmCryZVBaIc6g8rjGA.gif\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/miro.medium.com\/max\/960\/1*Ku54qmCryZVBaIc6g8rjGA.gif\">Vanishing gradients<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">This issue is because parameters in earlier layers of the network are far away from the cost function. The cost function is the source of the gradient that is propagated back through the network. As the error is back-propagated through an increasingly deep network, a larger number of parameters contribute to the error. This causes earlier layers closer to the input to get smaller and smaller updates.<\/p>\n\n\n\n<p class=\"graf graf--p\">This is because of the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.youtube.com\/watch?v=zFOD3NR5I4Q\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.youtube.com\/watch?v=zFOD3NR5I4Q\"><strong class=\"markup--strong markup--p-strong\">chain rule<\/strong><\/a>.<\/p>\n\n\n\n<p class=\"graf graf--p\">The chain rule multiplies error gradients for weights in the network, and <em class=\"markup--em markup--p-em\">multiplying lots of values that are less than one will result in smaller and smaller values<\/em>. When the gradient error comes to the first layer, its value goes to zero. The inverse problem is the exploding gradient, which happens when large error gradients accumulate during training, resulting in massive updates to model weights in the earlier layers [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/towardsdatascience.com\/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/towardsdatascience.com\/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec\">Source<\/a>].<\/p>\n\n\n\n<p class=\"graf graf--p\">The net result of both of these scenarios is that early layers in the network become more challenging to train.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*0nrdLudgzkJbQ2gp\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4330\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4330\">Jon Krohn\u2019s Deep Learning&nbsp;Course<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">That is until a new convolutional neural network called <strong class=\"markup--strong markup--p-strong\">Residual Networks (ResNets)<\/strong> emerged, which aimed to preserve the gradient.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-why-we-need-nbsp-resnet\"><strong class=\"markup--strong markup--h3-strong\">Why We Need&nbsp;ResNet<\/strong><\/h3>\n\n\n\n<p class=\"graf graf--p\">Let\u2019s imagine that we had a shallow network that was performing well.<\/p>\n\n\n\n<p class=\"graf graf--p\">If we were to copy those layers and their weights and stack them as new layers to make the model deeper, our intuition might suggest that the new deeper model would improve on the gains from the existing pre-trained model. If the new layers were to perform simple identity mapping\u200a\u2014\u200awhere all they were doing was reproducing the exact results of the earlier layers\u200a\u2014\u200athen you\u2019d expect no increase in training errors. However, that\u2019s not the case [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4465\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4465\">Source<\/a>].<\/p>\n\n\n\n<p class=\"graf graf--p\">These deep networks struggle to learn these identity functions.<\/p>\n\n\n\n<p class=\"graf graf--p\">The new layers that are added either add new information or decrease error. Or, they need to add new information and increase errors. Beyond a certain point <em class=\"markup--em markup--p-em\">adding extra layers will contribute to an overall degradation in model performance<\/em> [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4414\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4414\">Source<\/a>].<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-the-novelty-of-nbsp-resnet\"><strong class=\"markup--strong markup--h4-strong\">The Novelty of&nbsp;ResNet<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">ResNet tackles the vanishing gradient problem using <strong class=\"markup--strong markup--p-strong\">skip connections<\/strong>.<\/p>\n\n\n\n<p class=\"graf graf--p\">Skip connections allow you to take the activation value from an earlier layer and pass it to a much deeper layer in a network. This allows for smoother gradient flow, ensuring important features are preserved in the training process [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/towardsdatascience.com\/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/towardsdatascience.com\/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec\">Source<\/a>]. These skip connections are housed inside residual blocks.<\/p>\n\n\n\n<p class=\"graf graf--p\">Let\u2019s explore what residual blocks are and how they work [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/ZILIbUvp5lk?t=41\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/ZILIbUvp5lk?t=41\">Source<\/a>]:<\/p>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li>Assume you have two layers of a neural network starting with some activation X.<\/li>\n\n\n\n<li>X is passed through a weight matrix where some linear operator\u200a\u2014\u200aF\u200a\u2014\u200ais applied.<\/li>\n\n\n\n<li>The result\u200a\u2014\u200awhich we can call X\u2019 = F(X)\u200a\u2014\u200ais then passed to a non-linear ReLU activation function.<\/li>\n\n\n\n<li>You then apply the same linear transformation\u200a\u2014\u200aF\u200a\u2014\u200ato X\u2019<\/li>\n\n\n\n<li>Instead of applying the ReLU non-linearity directly to the result of the preceding operation\u200a\u2014\u200ayou skip it\u200a\u2014\u200aand add X to F(X\u2019).<\/li>\n\n\n\n<li>That result is now passed to a ReLU non-linearity.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*QpX4yPlWGTl31ijW\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/www.researchgate.net\/figure\/Residual-module-with-skip-connection_fig3_356195076\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.researchgate.net\/figure\/Residual-module-with-skip-connection_fig3_356195076\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">These residual blocks, with their skip connections, provide two important benefits. First is avoiding the problems of vanishing or exploding gradients. The second is enabling models to learn an identity function. The modules either learn something useful and contribute to reducing the network error, or they perform identity mapping and do nothing at all [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4560\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4560\">Source<\/a>]. This enables information to skip the functions located within the module.<\/p>\n\n\n\n<p class=\"graf graf--p\">Residual networks can be considered complex ensembles of many shallower networks pooled at various depths [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4560\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/c4e7nleyoZM?list=PLRDl2inPrWQXSDfCjPKSeEMFLwYpfytxH&amp;t=4560\">Source<\/a>] and have allowed us to accomplish things that were not possible before.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-what-resnet-allows-you-to-nbsp-do\"><strong class=\"markup--strong markup--h4-strong\">What ResNet Allows You To&nbsp;Do<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">Skip connections allow you to propagate larger gradients to the earliest layers in your network by skipping some layers in between.<\/p>\n\n\n\n<p class=\"graf graf--p\">This allows those early layers to learn as fast as the final layers. Different parts of the network are trained at differing rates on various training data points based on how the error flows backward in the network [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/towardsdatascience.com\/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/towardsdatascience.com\/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec\">Source<\/a>]. Ultimately, this allows you<strong class=\"markup--strong markup--p-strong\"> to train deeper networks than was previously possible<\/strong> without any noticeable loss in performance [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/ZILIbUvp5lk\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/youtu.be\/ZILIbUvp5lk\">Source<\/a>].<\/p>\n\n\n\n<p class=\"graf graf--p\">This breakthrough has allowed for some amazing results in computer vision.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-where-resnet-works-nbsp-best\"><strong class=\"markup--strong markup--h4-strong\">Where ResNet Works&nbsp;Best<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">ResNet works exceptionally well for many computer vision applications ranging from image recognition, object detection, facial recognition, image classification, semantic segmentation, and instance segmentation.<\/p>\n\n\n\n<p class=\"graf graf--p\">This model architecture has seen many accomplishments, including:<\/p>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li>First place in the ILSVRC 2015 classification competition with a top-5 error rate of 3.57%<\/li>\n\n\n\n<li>First place in ILSVRC and COCO 2015 competition in ImageNet Detection, ImageNet localization, COCO detection and COCO segmentation.<\/li>\n\n\n\n<li>Beat VGG-16 layers in Faster R-CNN with ResNet-101 with a 28% improvement<\/li>\n\n\n\n<li>Efficiently trained networks upwards of 100 and even 1000 layers.<\/li>\n<\/ul>\n\n\n\n<p class=\"graf graf--p\">ResNets have also been instrumental in<strong class=\"markup--strong markup--p-strong\"> transfer learning<\/strong>, allowing you to use the model weights from pre-trained models developed for standard computer vision benchmark datasets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-common-resnet-architectures\"><strong class=\"markup--strong markup--h3-strong\">Common ResNet Architectures<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-resnet-34-nbsp\"><strong class=\"markup--strong markup--h4-strong\">ResNet-34&nbsp;<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">ResNet-34 was the original Residual Network introduced in a 2015 <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/arxiv.org\/pdf\/1512.03385.pdf\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/arxiv.org\/pdf\/1512.03385.pdf\">research paper<\/a>.<\/p>\n\n\n\n<p class=\"graf graf--p\">This network inserts shortcut connections in a plain network. It had two design rules:<\/p>\n\n\n\n<ol class=\"wp-block-list postList\">\n<li>Layers have the same number of filters for the same output feature map size<\/li>\n\n\n\n<li>The number of filters doubled if the feature map size was halved to preserve the time complexity per layer.<\/li>\n<\/ol>\n\n\n\n<p class=\"graf graf--p\">The result was a network consisting of 34 weighted layers.<\/p>\n\n\n\n<p class=\"graf graf--p\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/viso.ai\/deep-learning\/resnet-residual-neural-network\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/viso.ai\/deep-learning\/resnet-residual-neural-network\/\">Source<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-resnet-50-nbsp\"><strong class=\"markup--strong markup--h4-strong\">ResNet-50&nbsp;<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">The ResNet-50 architecture is based on the ResNet-34 model with one important difference.<\/p>\n\n\n\n<p class=\"graf graf--p\">It used a stack of 3 layers instead of the earlier 2. The building blocks were modified to a bottleneck design because of concerns about the time it took to train the layers. Each of the 2-layer blocks in ResNet-34 was replaced with a 3-layer bottleneck block, forming the ResNet-50 architecture.<\/p>\n\n\n\n<p class=\"graf graf--p\">This resulted in higher accuracy than the ResNet-34 model.<\/p>\n\n\n\n<p class=\"graf graf--p\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/viso.ai\/deep-learning\/resnet-residual-neural-network\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/viso.ai\/deep-learning\/resnet-residual-neural-network\/\">Source<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-resnet-101-and-resnet-152\"><strong class=\"markup--strong markup--h4-strong\">ResNet-101 and ResNet-152<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">Using more 3-layer blocks builds larger Residual Networks like ResNet-101 or ResNet-152. And with increased network layers, the ResNet-152 has much lower complexity than other deeper models.<\/p>\n\n\n\n<p class=\"graf graf--p\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/viso.ai\/deep-learning\/resnet-residual-neural-network\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/viso.ai\/deep-learning\/resnet-residual-neural-network\/\">Source<\/a><\/p>\n\n\n\n<p class=\"graf graf--p\">ResNet has shown that we can effectively architect deeper and deeper networks by creating short paths from the early to later layers. There\u2019s no doubt that ResNet has proven powerful in a wide number of applications. However, there\u2019s a major drawback to building deeper networks: They require time. A lot of time. It\u2019s not uncommon of ResNets to require weeks of training.<\/p>\n\n\n\n<p class=\"graf graf--p\">This can be infeasible for real-world applications.<\/p>\n\n\n\n<p class=\"graf graf--p\">What if an architecture existed that would distill this simple pattern to provide maximum flow between layers in a network?<\/p>\n\n\n\n<p class=\"graf graf--p\">What if we could connect all the <strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">layers <\/em><\/strong>directly to each other?<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-introduction-to-nbsp-densenet\"><strong class=\"markup--strong markup--h3-strong\">Introduction to&nbsp;DenseNet<\/strong><\/h3>\n\n\n\n<p class=\"graf graf--p\">The Densely Connected Convolutional Neural Network (DenseNet) architecture takes skip connections to the max.<\/p>\n\n\n\n<p class=\"graf graf--p\">ResNet performs an element-wise addition to pass the output to the next layer or block. DenseNet connects all layers directly to each other. It does this through concatenation.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote graf graf--blockquote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Crucially, in contrast to ResNets, we never combine features through summation before they are passed into a layer; instead, we combine features by concatenating them.<\/p>\n<cite>Authors of the DenseNet paper<\/cite><\/blockquote>\n\n\n\n<p class=\"graf graf--p\">With concatenation, each layer receives collective knowledge from the preceding layers.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*TfuDT8qHV9YXFe-b.gif\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/www.youtube.com\/watch?v=-W6y8xnd--U\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.youtube.com\/watch?v=-W6y8xnd--U\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p\">Each layer receives feature maps from the preceding layers and passes its feature map to deeper layers. The output layer now has information from every layer, returning to the first layer. This ensures a direct route for the information back through the network.<\/p>\n\n\n\n<p class=\"graf graf--p\">As a result, we end up with a more compact model because of this feature reuse.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EJPFMJyKaVKmf4wl.jpeg\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/towardsdatascience.com\/paper-review-densenet-densely-connected-convolutional-networks-acf9065dfefb\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/towardsdatascience.com\/paper-review-densenet-densely-connected-convolutional-networks-acf9065dfefb\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote graf graf--blockquote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The concept of dense connections has been portrayed in dense blocks. A dense block comprises <em class=\"markup--em markup--blockquote-em\">n <\/em>dense layers. These dense layers are connected using a dense circuitry such that each dense layer receives feature maps from all preceding layers and passes it\u2019s feature maps to all subsequent layers. The dimensions of the features (width, height) stay the same in a dense block.<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote graf graf--blockquote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>[<a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/towardsdatascience.com\/paper-review-densenet-densely-connected-convolutional-networks-acf9065dfefb\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/towardsdatascience.com\/paper-review-densenet-densely-connected-convolutional-networks-acf9065dfefb\">Source<\/a>]<\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-densenet-benefits\"><strong class=\"markup--strong markup--h4-strong\">DenseNet Benefits<\/strong><\/h4>\n\n\n\n<p class=\"graf graf--p\">Because of these dense connections, the model requires fewer layers, as there is no need to learn redundant feature maps, allowing the <strong class=\"markup--strong markup--p-strong\">collective knowledge <\/strong>(features learned collectively by the network) to be reused.<\/p>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li>Alleviates <strong class=\"markup--strong markup--li-strong\">vanishing gradient problem<\/strong><\/li>\n\n\n\n<li>Stronger <strong class=\"markup--strong markup--li-strong\">feature propagation<\/strong><\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Feature reuse<\/strong><\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Reduced parameter<\/strong> count<\/li>\n<\/ul>\n\n\n\n<p class=\"graf graf--p\">Fewer and narrower layers mean the model has <strong class=\"markup--strong markup--p-strong\">fewer parameters<\/strong> to learn, making them easier to train.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-advantages-and-disadvantages-of-these-differences\"><strong class=\"markup--strong markup--h3-strong\">Advantages and Disadvantages of These Differences<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image aligncenter graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*NrGj0PSsxdNdBg3u9MfdpQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"graf graf--p\">DenseNet architecture, while demonstrating several significant advantages, is typically more suited to smaller or moderately-sized networks rather than very deep networks. This is primarily due to its intensive memory usage stemming from the dense connections. However, for certain applications, the benefits of DenseNet can be substantial.<\/p>\n\n\n\n<p class=\"graf graf--p\">The first major advantage of DenseNet is its performance on benchmark datasets like ImageNet. The architecture has outperformed other competing architectures in terms of accuracy and efficiency. This testament to its robustness and capability in handling complex image recognition tasks.<\/p>\n\n\n\n<p class=\"graf graf--p\">Secondly, DenseNet\u2019s improved parameter efficiency is a key factor in its ease of training. Unlike other architectures that might require many parameters to achieve high accuracy, DenseNet achieves this with a comparatively lower parameter count. This efficiency stems from its ability to reuse features across the network, reducing the need for learning redundant feature maps. This makes the network more compact and simplifies the training process, as there are fewer parameters to adjust during the learning phase.<\/p>\n\n\n\n<p class=\"graf graf--p\">Another aspect where DenseNet shines is in its resilience to the vanishing gradient problem, thanks to its dense connections. Each layer receives gradients directly from the loss function and subsequent layers, making it easier to train deeper versions of these networks compared to traditional architectures.<\/p>\n\n\n\n<p class=\"graf graf--p\">Despite these advantages, DenseNet also has its drawbacks. One of the main challenges is computational and memory efficiency. Due to the dense connections and concatenations of feature maps from all preceding layers, DenseNets can become quite memory intensive, especially as the network depth increases. This can make them less feasible for deployment on devices with limited resources or for applications requiring real-time processing.<\/p>\n\n\n\n<p class=\"graf graf--p\">Additionally, while DenseNets have fewer parameters, they can still be susceptible to overfitting, especially when trained on smaller datasets. It\u2019s important to implement appropriate regularization techniques, such as dropout or data augmentation, to mitigate this risk.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-densenet-vs-nbsp-resnet\">DenseNet vs&nbsp;ResNet<\/h3>\n\n\n\n<p class=\"graf graf--p\">When comparing DenseNet with ResNet, several key differences stand out:<\/p>\n\n\n\n<ol class=\"wp-block-list postList\">\n<li>Skip Connections: ResNet uses skip connections to implement identity mappings, allowing gradients to flow through the network without attenuation. DenseNet, on the other hand, uses dense connections, concatenating feature maps from all preceding layers.<\/li>\n\n\n\n<li>Memory Usage: DenseNets generally require more memory than ResNets due to the concatenation of feature maps from all layers. This can be a limiting factor in certain applications.<\/li>\n\n\n\n<li>Parameter Efficiency: DenseNet is often more parameter-efficient than ResNet. It reuses features throughout the network, reducing the need to learn redundant feature maps.<\/li>\n\n\n\n<li>Training Dynamics: DenseNets might have a smoother training process due to the continuous feature propagation throughout the network. However, this can also lead to increased training time and computational costs.<\/li>\n\n\n\n<li>Performance: Both architectures have shown exceptional performance in various tasks. ResNet is often preferred for very deep networks due to its simplicity and lower computational requirements. DenseNet shines in scenarios where feature reuse is critical and can afford the additional computational cost.<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\" id=\"h-advantages-and-disadvantages-of-these-differences-0\">Advantages and Disadvantages of These Differences<\/h4>\n\n\n\n<p class=\"graf graf--p\">The main advantage of DenseNet\u2019s architecture is its efficiency in reusing features and reduced parameter count. This can lead to more compact models that are powerful yet simpler to train. However, the downside is the increased computational and memory requirement, which might not be suitable for all applications, especially those with resource constraints.<\/p>\n\n\n\n<p class=\"graf graf--p\">ResNet\u2019s main advantage lies in its ability to facilitate the training of very deep networks through skip connections, which mitigate the vanishing gradient problem. Its architecture is more straightforward and often more computationally efficient than DenseNet. However, it might not be as efficient in feature reuse, potentially requiring more parameters to achieve similar performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\" id=\"h-conclusion\">Conclusion<\/h3>\n\n\n\n<p class=\"graf graf--p\">In summary, both ResNet and DenseNet offer unique advantages and have their specific use cases in deep learning. The choice between the two depends on the specific requirements of the task at hand, including computational resources, network depth, and the need for parameter efficiency. Understanding these architectures and their differences is crucial for any deep learning practitioner looking to leverage the latest advancements in CNNs for their applications.<\/p>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\"><\/div>\n<\/div>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>With a Focus on ResNet and DenseNet Photo by Ion Fet on&nbsp;Unsplash This comprehensive guide aims to demystify CNNs, providing insights into their structure, functionality, and why they are so effective for image-related tasks. We delve into the intricacies of Residual Networks (ResNet), a groundbreaking architecture in CNNs. Understanding why ResNet is essential, its innovative [&hellip;]<\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[65,7],"tags":[70,71,52,31,34],"coauthors":[166],"class_list":["post-8283","post","type-post","status-publish","format-standard","hentry","category-llmops","category-tutorials","tag-langchain","tag-language-models","tag-llm","tag-llmops","tag-prompt-engineering"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>An Intuitive Guide to Convolutional Neural\u00a0Networks - Comet<\/title>\n<meta name=\"description\" content=\"This guide aims to demystify CNNs, providing insights into their structure, functionality + why they are so effective for image-related tasks\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An Intuitive Guide to Convolutional Neural\u00a0Networks\" \/>\n<meta property=\"og:description\" content=\"This guide aims to demystify CNNs, providing insights into their structure, functionality + why they are so effective for image-related tasks\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-30T15:06:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-17T20:45:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"An Intuitive Guide to Convolutional Neural\u00a0Networks - Comet","description":"This guide aims to demystify CNNs, providing insights into their structure, functionality + why they are so effective for image-related tasks","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/","og_locale":"en_US","og_type":"article","og_title":"An Intuitive Guide to Convolutional Neural\u00a0Networks","og_description":"This guide aims to demystify CNNs, providing insights into their structure, functionality + why they are so effective for image-related tasks","og_url":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-11-30T15:06:18+00:00","article_modified_time":"2025-11-17T20:45:43+00:00","og_image":[{"url":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk","type":"","width":"","height":""}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"17 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/"},"author":{"name":"Harpreet Sahota","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6"},"headline":"An Intuitive Guide to Convolutional Neural\u00a0Networks","datePublished":"2023-11-30T15:06:18+00:00","dateModified":"2025-11-17T20:45:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/"},"wordCount":3182,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk","keywords":["LangChain","Language Models","LLM","LLMOps","Prompt Engineering"],"articleSection":["LLMOps","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/","url":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/","name":"An Intuitive Guide to Convolutional Neural\u00a0Networks - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk","datePublished":"2023-11-30T15:06:18+00:00","dateModified":"2025-11-17T20:45:43+00:00","description":"This guide aims to demystify CNNs, providing insights into their structure, functionality + why they are so effective for image-related tasks","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#primaryimage","url":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk","contentUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*xNcoatSR3XQ4QSsk"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/an-intuitive-guide-to-convolutional-neural-networks\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"An Intuitive Guide to Convolutional Neural\u00a0Networks"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6","name":"Harpreet Sahota","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/2d21512be19ba7e19a71a803309e2a88","url":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","caption":"Harpreet Sahota"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/theartistsofdatasciencegmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8283","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8283"}],"version-history":[{"count":2,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8283\/revisions"}],"predecessor-version":[{"id":18470,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8283\/revisions\/18470"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8283"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8283"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}