{"id":7860,"date":"2023-10-06T14:35:37","date_gmt":"2023-10-06T22:35:37","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7860"},"modified":"2025-04-24T17:05:48","modified_gmt":"2025-04-24T17:05:48","slug":"resnet-how-one-paper-changed-deep-learning-forever","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/","title":{"rendered":"ResNet: How One Paper Changed Deep Learning Forever"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\">\n\n\n\n<div class=\"fi fj fk fl fm\">\n<div class=\"ab ca\">\n<div class=\"ch bg eu ev ew ex\">\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp ec mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png\" alt=\"\" width=\"700\" height=\"305\"><\/figure><div class=\"mf mg mh\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mu mv mw mf mg mx my be b bf z dw\" data-selectable-paragraph=\"\">The ResNet Paper<\/figcaption><\/figure>\n<p id=\"60cd\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">In December of 2015 a paper was published that rocked the deep learning world.<\/p>\n<p id=\"47fc\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Widely regarded as one of the most influential papers in modern deep learning, it has been cited over 110,000 times. The name of this paper will go down in the annals of deep learning history: <a class=\"af nu\" href=\"https:\/\/arxiv.org\/abs\/1512.03385\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be nv\">Deep Residual Learning for Image Recognition (aka, the ResNet paper).<\/strong><\/a><\/p>\n<p id=\"2657\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">This paper showed the deep learning community that it was possible to construct <strong class=\"be nv\"><em class=\"nw\">increasingly deeper network architectures<\/em><\/strong> that can either perform well or at least the same as the shallower networks.<\/p>\n<p id=\"1b5d\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">When AlexNet hit the scene in 2012, prevailing wisdom suggested adding more layers to neural networks would lead to better results. This was evidenced by breakthroughs coming from VGGNet, GoogleNet, and others.<\/p>\n<p id=\"d0da\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">This set the deep learning community on a quest to go deeper.<\/p>\n<p id=\"4daf\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">It turns out, though, that learning better networks is not as easy as stacking more and more layers. Researchers observed that the accuracy of deep networks would increase up to a saturation point before leveling off. Additionally, an unusual phenomenon was observed:<\/p>\n<p id=\"2fe8\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><em class=\"nw\">Training error would increase as you add layers to an already deep network.<\/em><\/p>\n<p id=\"8440\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">This was primarily due to two problems<\/p>\n<p id=\"f44e\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">1) Vanishing\/exploding gradients<\/p>\n<p id=\"f5cb\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">2) The degradation problem<\/p>\n<h2 id=\"611d\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Vanishing\/exploding gradients<\/h2>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:480\/1*UsnL3rZnKwEPyPVMkBha1w.gif\" alt=\"\" width=\"480\" height=\"270\"><\/figure><div class=\"mf mg os\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*UsnL3rZnKwEPyPVMkBha1w.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*UsnL3rZnKwEPyPVMkBha1w.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*UsnL3rZnKwEPyPVMkBha1w.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*UsnL3rZnKwEPyPVMkBha1w.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*UsnL3rZnKwEPyPVMkBha1w.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*UsnL3rZnKwEPyPVMkBha1w.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:960\/1*UsnL3rZnKwEPyPVMkBha1w.gif 960w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 480px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*UsnL3rZnKwEPyPVMkBha1w.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*UsnL3rZnKwEPyPVMkBha1w.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*UsnL3rZnKwEPyPVMkBha1w.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*UsnL3rZnKwEPyPVMkBha1w.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*UsnL3rZnKwEPyPVMkBha1w.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*UsnL3rZnKwEPyPVMkBha1w.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:960\/1*UsnL3rZnKwEPyPVMkBha1w.gif 960w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 480px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dw\" data-selectable-paragraph=\"\">Google search<\/figcaption>\n<\/figure>\n<p id=\"93f4\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">The vanishing\/exploding gradients problem is a by-product of <a class=\"af nu\" href=\"https:\/\/www.youtube.com\/watch?v=zFOD3NR5I4Q\" target=\"_blank\" rel=\"noopener ugc nofollow\">the chain rule<\/a>. The chain rule multiplies error gradients for weights in the network. Multiplying lots of values that are less than one will result in smaller and smaller values. As those error gradients approach the earlier layers of a network, their value will tend to zero.<\/p>\n<p id=\"8769\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">This results in smaller and smaller updates to earlier layers (not much learning happening).<\/strong><\/p>\n<p id=\"e11c\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">The inverse problem is the exploding gradient problem. This occurs due to large error gradients accumulating during training, resulting in massive updates to model weights in the earlier layers (since multiplying lots of values that are bigger than 1 will result in larger and larger values).<\/p>\n<p id=\"08d9\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">The reason for this issue?<\/strong><\/p>\n<p id=\"6291\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Parameters in earlier layers of the network are far away from the cost function, which is the source of the gradient that is propagated backward through the network. As the error is back propagated through an increasingly deep network, a larger number of parameters contribute to the error. The net result of both of these scenarios is that early layers in the network become more difficult to train.<\/p>\n<blockquote class=\"ot ou ov\"><p id=\"223e\" class=\"mz na nw be b gn nb nc nd gq ne nf ng ow ni nj nk ox nm nn no oy nq nr ns nt fi bj\" data-selectable-paragraph=\"\">But there\u2019s another more curious problem\u2026<\/p><\/blockquote>\n<h2 id=\"223f\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">The degradation problem<\/h2>\n<p id=\"50d3\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Adding more and more layers to these deep models leads to higher training errors, ending in a degradation in the expressive power of your network.<\/p>\n<p id=\"fc27\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">The degradation problem is unexpected, because its not caused by overfitting. Researchers were finding that as networks got deeper, the training loss would decrease but then shoot back up as more layers were added to the networks. Which is counterintuitive because you\u2019d expect your training error to decrease, converge, and plateau out as the number of layers in your network increases.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp ec mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-RrXOXOqTiOp0HLCmjFKnw.png\" alt=\"\" width=\"700\" height=\"240\"><\/figure><div class=\"mf mg pe\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*-RrXOXOqTiOp0HLCmjFKnw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*-RrXOXOqTiOp0HLCmjFKnw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*-RrXOXOqTiOp0HLCmjFKnw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*-RrXOXOqTiOp0HLCmjFKnw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*-RrXOXOqTiOp0HLCmjFKnw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*-RrXOXOqTiOp0HLCmjFKnw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*-RrXOXOqTiOp0HLCmjFKnw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*-RrXOXOqTiOp0HLCmjFKnw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dw\" data-selectable-paragraph=\"\">Author<\/figcaption>\n<\/figure>\n<p id=\"1b42\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Let\u2019s imagine that you had a shallow network that was performing well.<\/p>\n<p id=\"178d\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">If you take a \u201cshallow\u201d network and just stack on more layers to create a deeper network, the performance of the deeper network should be at least as good as the shallow network. Why? Because, in theory, deeper network could learn the shallow network. The shallow network is a subset of the deeper network. But this doesn\u2019t happen in practice!<\/p>\n<p id=\"2dcd\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You could even set the new stacked layers to be identity layers, and still find your training error getting worse when you stack more layers on top of a shallower model. <strong class=\"be nv\">Deeper networks were leading to higher training error!<\/strong><\/p>\n<p id=\"a45c\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Both of these issues \u2014 the vanishing\/exploding gradients and degradation problems \u2014 threatened to halt progress of deep neural networks, until the ResNet paper came out.<\/p>\n<p id=\"cabf\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">The ResNet paper introduced a novel solution to these two pesky problems that plagued the architects of deep neural networks.<\/p>\n<h2 id=\"f6bf\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">The skip connection<\/h2>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp ec mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*43sSf2OsoBI4E7yf34xtCw.png\" alt=\"\" width=\"700\" height=\"247\"><\/figure><div class=\"mf mg pf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*43sSf2OsoBI4E7yf34xtCw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*43sSf2OsoBI4E7yf34xtCw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*43sSf2OsoBI4E7yf34xtCw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*43sSf2OsoBI4E7yf34xtCw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*43sSf2OsoBI4E7yf34xtCw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*43sSf2OsoBI4E7yf34xtCw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*43sSf2OsoBI4E7yf34xtCw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*43sSf2OsoBI4E7yf34xtCw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dw\" data-selectable-paragraph=\"\">Image Source: Sachin Joglekar<\/figcaption>\n<\/figure>\n<p id=\"ad14\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Skip connections, which are housed in residual blocks, allow you to take the activation value from an earlier layer and pass it to a deeper layer in a network. Skip connections enable deep networks to learn the identity function. Learning the identity function allows a deeper layer to perform as well as an earlier layer, or at the very least it won\u2019t perform any worse. The result is smoother gradient flow, ensuring important features are preserved in the training process.<\/p>\n<p id=\"19dd\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">The invention of the skip connection has given us the ability to build deeper and deeper networks while avoiding the problem of vanishing\/exploding gradients and degradation.<\/p>\n<p id=\"65e2\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Here\u2019s how it works\u2026<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp ec mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*9sBHvSmGWA7knDJf.gif\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"mf mg pf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*9sBHvSmGWA7knDJf.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*9sBHvSmGWA7knDJf.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*9sBHvSmGWA7knDJf.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*9sBHvSmGWA7knDJf.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*9sBHvSmGWA7knDJf.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*9sBHvSmGWA7knDJf.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*9sBHvSmGWA7knDJf.gif 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*9sBHvSmGWA7knDJf.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*9sBHvSmGWA7knDJf.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*9sBHvSmGWA7knDJf.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*9sBHvSmGWA7knDJf.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*9sBHvSmGWA7knDJf.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*9sBHvSmGWA7knDJf.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*9sBHvSmGWA7knDJf.gif 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dw\" data-selectable-paragraph=\"\">Demonstrating information flow in a plain network. Source: EasyLearn.AI<\/figcaption>\n<\/figure>\n<p id=\"f7e9\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Instead of the previous layers output being passed directly onto the next block, a copy of that output is made, then that copy is passed through a residual block. This residual block will process the copied output matrix, X \u2014 with a 3&#215;3 convolution, followed by batch norm and ReLU to yield a matrix Z.<\/p>\n<p id=\"bbba\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Then, X and Z would be added together, element by element, to yield the output to the next layer\/block.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp ec mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*p2-45VuVWBJI-L_C.gif\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"mf mg pf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*p2-45VuVWBJI-L_C.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*p2-45VuVWBJI-L_C.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*p2-45VuVWBJI-L_C.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*p2-45VuVWBJI-L_C.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*p2-45VuVWBJI-L_C.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*p2-45VuVWBJI-L_C.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*p2-45VuVWBJI-L_C.gif 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*p2-45VuVWBJI-L_C.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*p2-45VuVWBJI-L_C.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*p2-45VuVWBJI-L_C.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*p2-45VuVWBJI-L_C.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*p2-45VuVWBJI-L_C.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*p2-45VuVWBJI-L_C.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*p2-45VuVWBJI-L_C.gif 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dw\" data-selectable-paragraph=\"\">Demonstrating information flow in a residual network. Source: EasyLearn.AI<\/figcaption>\n<\/figure>\n<p id=\"8203\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Doing this helps us ensure that any added layers in a neural network are useful for learning. Worst case scenario is that the residual block could output a bunch of zeros, which leaves the X+Z to end up still being X, since X+Z=X if Z is just the zero matrix.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fi fj fk fl fm\">\n<div class=\"ab ca\">\n<div class=\"ch bg eu ev ew ex\">\n<h1 id=\"7d0f\" class=\"py ny fp be nz pz qa gp od qb qc gs oh qd qe qf qg qh qi qj qk ql qm qn qo qp bj\" data-selectable-paragraph=\"\">ResNet in action!<\/h1>\n<p id=\"8696\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Now it\u2019s time to see ResNet in action.<\/p>\n<p id=\"ea27\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You could try to implement ResNet from scratch, train it on ImageNet, and try to find the optimal training parameters yourself\u2026 but why do that when you can use something out of the box? That\u2019s what <a class=\"af nu\" href=\"http:\/\/bit.ly\/sg-harp-medium-post\" target=\"_blank\" rel=\"noopener ugc nofollow\">SuperGradients<\/a> gives you: A pre-trained ResNet model with a robust set of training parameters that is ready for you to use with minimal configuration!<\/p>\n<p id=\"ac7a\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You\u2019ll learn a bit about transfer learning and how to use the <a class=\"af nu\" href=\"http:\/\/bit.ly\/sg-harp-medium-post\" target=\"_blank\" rel=\"noopener ugc nofollow\">SuperGradients<\/a>training library on the MiniPlaces dataset to perform image classification. <a class=\"af nu\" href=\"http:\/\/bit.ly\/sg-harp-medium-post\" target=\"_blank\" rel=\"noopener ugc nofollow\">SuperGradients<\/a> is a PyTorch based training library that has pre-trained models for classification, detection, and segmentation tasks.<\/p>\n<p id=\"bb6b\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You can follow along here, or open up this notebook on Google Colab to get hands on:<\/p>\n<h2 id=\"b2f3\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Install dependencies<\/h2>\n<p id=\"08de\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">You\u2019ll need to install the following dependencies.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"3f62\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">!pip install super_gradients==<span class=\"hljs-number\">3.0<\/span><span class=\"hljs-number\">.0<\/span> gwpy &amp;&gt; \/dev\/null\n!pip install matplotlib==<span class=\"hljs-number\">3.1<\/span><span class=\"hljs-number\">.3<\/span> &amp;&gt; \/dev\/null\n!pip install torchinfo &amp;&gt; \/dev\/null<\/span><\/pre>\n<h2 id=\"1d84\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Import packages<\/h2>\n<p id=\"404f\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Now you can import all the packages necessary for this tutorial.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"f050\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> os\n<span class=\"hljs-keyword\">import<\/span> requests\n<span class=\"hljs-keyword\">import<\/span> zipfile\n<span class=\"hljs-keyword\">import<\/span> random\n<span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np\n<span class=\"hljs-keyword\">import<\/span> torchvision\n<span class=\"hljs-keyword\">import<\/span> pprint\n<span class=\"hljs-keyword\">import<\/span> torch\n<span class=\"hljs-keyword\">import<\/span> pathlib\n\n<span class=\"hljs-keyword\">from<\/span> matplotlib <span class=\"hljs-keyword\">import<\/span> pyplot <span class=\"hljs-keyword\">as<\/span> plt\n<span class=\"hljs-keyword\">from<\/span> torchinfo <span class=\"hljs-keyword\">import<\/span> summary\n<span class=\"hljs-keyword\">from<\/span> pathlib <span class=\"hljs-keyword\">import<\/span> Path, PurePath\n<span class=\"hljs-keyword\">from<\/span> torchvision <span class=\"hljs-keyword\">import<\/span> transforms\n<span class=\"hljs-keyword\">from<\/span> torchvision <span class=\"hljs-keyword\">import<\/span> datasets\n<span class=\"hljs-keyword\">from<\/span> torch.utils.data <span class=\"hljs-keyword\">import<\/span> DataLoader\n<span class=\"hljs-keyword\">from<\/span> PIL <span class=\"hljs-keyword\">import<\/span> Image\n<span class=\"hljs-keyword\">from<\/span> typing <span class=\"hljs-keyword\">import<\/span> <span class=\"hljs-type\">List<\/span>, <span class=\"hljs-type\">Tuple<\/span>\n<span class=\"hljs-keyword\">import<\/span> super_gradients\n<span class=\"hljs-keyword\">from<\/span> super_gradients.training <span class=\"hljs-keyword\">import<\/span> models\n<span class=\"hljs-keyword\">from<\/span> super_gradients.training <span class=\"hljs-keyword\">import<\/span> dataloaders\n<span class=\"hljs-keyword\">from<\/span> super_gradients.training <span class=\"hljs-keyword\">import<\/span> Trainer\n<span class=\"hljs-keyword\">from<\/span> super_gradients.training <span class=\"hljs-keyword\">import<\/span> training_hyperparams<\/span><\/pre>\n<h2 id=\"4292\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Download data<\/h2>\n<p id=\"6df1\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">You\u2019ll use use <strong class=\"be nv\">miniplaces<\/strong> dataset. You can learn more about the dataset <a class=\"af nu\" href=\"https:\/\/github.com\/CSAILVision\/miniplaces\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"1f12\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">torchvision.datasets.utils.download_and_extract_archive(<span class=\"hljs-string\">'https:\/\/dissect.csail.mit.edu\/datasets\/miniplaces.zip'<\/span>,<span class=\"hljs-string\">'datasets'<\/span>)<\/span><\/pre>\n<h2 id=\"c0c8\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Configurations<\/h2>\n<p id=\"2404\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">The following class houses configuration values that will be helpful as you progress along the tutorial.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"745a\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">class<\/span> <span class=\"hljs-title.class\">config<\/span>:\n    EXPERIMENT_NAME = <span class=\"hljs-string\">'resnet_in_action'<\/span>\n    MODEL_NAME = <span class=\"hljs-string\">'resnet50'<\/span>\n    CHECKPOINT_DIR = <span class=\"hljs-string\">'checkpoints'<\/span>\n\n    <span class=\"hljs-comment\"># specify the paths to training and validation set <\/span>\n    TRAIN_DIR = <span class=\"hljs-string\">'datasets\/miniplaces\/train'<\/span>\n    VAL_DIR = <span class=\"hljs-string\">'datasets\/miniplaces\/val'<\/span>\n\n    <span class=\"hljs-comment\"># set the input height and width<\/span>\n    INPUT_HEIGHT = <span class=\"hljs-number\">224<\/span>\n    INPUT_WIDTH = <span class=\"hljs-number\">224<\/span>\n\n    <span class=\"hljs-comment\"># set the input height and width<\/span>\n    IMAGENET_MEAN = [<span class=\"hljs-number\">0.485<\/span>, <span class=\"hljs-number\">0.456<\/span>, <span class=\"hljs-number\">0.406<\/span>]\n    IMAGENET_STD = [<span class=\"hljs-number\">0.229<\/span>, <span class=\"hljs-number\">0.224<\/span>, <span class=\"hljs-number\">0.225<\/span>]\n\n    NUM_WORKERS = os.cpu_count()\n\n    DEVICE = <span class=\"hljs-string\">'cuda'<\/span> <span class=\"hljs-keyword\">if<\/span> torch.cuda.is_available() <span class=\"hljs-keyword\">else<\/span> <span class=\"hljs-string\">'cpu'<\/span>\n\n    FLIP_PROB = <span class=\"hljs-number\">0.25<\/span>\n    ROTATION_DEG = <span class=\"hljs-number\">15<\/span>\n    JITTER_PARAM = <span class=\"hljs-number\">0.25<\/span>\n    BATCH_SIZE = <span class=\"hljs-number\">64<\/span><\/span><\/pre>\n<h2 id=\"b8fe\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Experiment setup<\/h2>\n<p id=\"a193\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">When working with SuperGradients first thing you have to do is initialize your <strong class=\"be nv\">trainer<\/strong>.<\/p>\n<p id=\"3c48\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">The <strong class=\"be nv\">trainer<\/strong> is in charge of pretty much everything from training, evaluation, saving checkpoints, plotting etc. The <strong class=\"be nv\">experiment name<\/strong> argument is important as all checkpoints, logs and tensorboards will be saved in a directory with the same name.<\/p>\n<p id=\"7c02\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">This directory will be created as a sub-directory of <strong class=\"be nv\">ckpt_root_dir<\/strong> as follow:<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"e68a\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">ckpt_root_dir\n|\u2500\u2500\u2500 experiment_name_1\n\u2502       ckpt_best.pth                     # Model checkpoint on best epoch\n\u2502       ckpt_latest.pth                   # Model checkpoint on last epoch\n\u2502       average_model.pth                 # Model checkpoint averaged over epochs\n\u2502       events.out.tfevents.1659878383... # Tensorflow artifacts of a specific run\n\u2502       log_Aug07_11_52_48.txt            # Trainer logs of a specific run\n\u2514\u2500\u2500\u2500 experiment_name_2\n        ...<\/span><\/pre>\n<pre class=\"qz qq qr qs bo qt ba bj\"><span id=\"276a\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">trainer = Trainer(experiment_name=config.EXPERIMENT_NAME, ckpt_root_dir=config.CHECKPOINT_DIR, device=config.DEVICE)<\/span><\/pre>\n<h2 id=\"42e9\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Create dataloaders<\/h2>\n<p id=\"5de6\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">The SG trainer is compatible with PyTorch dataloaders, so you can use your own dataloaders.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"7c7d\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title.function\">create_dataloaders<\/span>(<span class=\"hljs-params\">\n    train_dir: <span class=\"hljs-built_in\">str<\/span>,\n    val_dir: <span class=\"hljs-built_in\">str<\/span>,\n    train_transform: transforms.Compose,\n    val_transform:  transforms.Compose,\n    batch_size: <span class=\"hljs-built_in\">int<\/span>,\n    num_workers: <span class=\"hljs-built_in\">int<\/span>=config.NUM_WORKERS\n<\/span>):\n  <span class=\"hljs-string\">\"\"\"Creates training and validation DataLoaders.\n  Args:\n    train_dir: Path to training data.\n    val_dir: Path to validation data.\n    transform: Transformation pipeline.\n    batch_size: Number of samples per batch in each of the DataLoaders.\n    num_workers: An integer for number of workers per DataLoader.\n  Returns:\n    A tuple of (train_dataloader, val_dataloader, class_names).\n  \"\"\"<\/span>\n  <span class=\"hljs-comment\"># Use ImageFolder to create dataset<\/span>\n  train_data = datasets.ImageFolder(train_dir, transform=train_transform)\n  val_data = datasets.ImageFolder(val_dir, transform=val_transform)\n\n  <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">f\"[INFO] training dataset contains <span class=\"hljs-subst\">{<span class=\"hljs-built_in\">len<\/span>(train_data)}<\/span> samples...\"<\/span>)\n  <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">f\"[INFO] validation dataset contains <span class=\"hljs-subst\">{<span class=\"hljs-built_in\">len<\/span>(val_data)}<\/span> samples...\"<\/span>)\n\n  <span class=\"hljs-comment\"># Get class names<\/span>\n  class_names = train_data.classes\n  <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">f\"[INFO] dataset contains <span class=\"hljs-subst\">{<span class=\"hljs-built_in\">len<\/span>(class_names)}<\/span> labels...\"<\/span>)\n\n  <span class=\"hljs-comment\"># Turn images into data loaders<\/span>\n  <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"[INFO] creating training and validation set dataloaders...\"<\/span>)\n  train_dataloader = DataLoader(\n      train_data,\n      batch_size=batch_size,\n      shuffle=<span class=\"hljs-literal\">True<\/span>,\n      num_workers=num_workers,\n      pin_memory=<span class=\"hljs-literal\">True<\/span>,\n  )\n  val_dataloader = DataLoader(\n      val_data,\n      batch_size=batch_size,\n      shuffle=<span class=\"hljs-literal\">False<\/span>,\n      num_workers=num_workers,\n      pin_memory=<span class=\"hljs-literal\">True<\/span>,\n  )\n\n  <span class=\"hljs-keyword\">return<\/span> train_dataloader, val_dataloader, class_names<\/span><\/pre>\n<h2 id=\"5459\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Transforms<\/h2>\n<p id=\"7266\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">This next code block will instantiate a transformation pipeline for both training and validation.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"e196\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># initialize our data augmentation functions<\/span>\nresize = transforms.Resize(size=(config.INPUT_HEIGHT,config.INPUT_WIDTH))\n\nhorizontal_flip = transforms.RandomHorizontalFlip(p=config.FLIP_PROB)\n\nvertical_flip = transforms.RandomVerticalFlip(p=config.FLIP_PROB)\n\nrotate = transforms.RandomRotation(degrees=config.ROTATION_DEG)\n\nnorm = transforms.Normalize(mean=config.IMAGENET_MEAN, std=config.IMAGENET_STD)\n\nmake_tensor = transforms.ToTensor()\n\n<span class=\"hljs-comment\"># initialize our training and validation set data augmentation pipeline<\/span>\ntrain_transforms = transforms.Compose([resize, horizontal_flip, vertical_flip, rotate, make_tensor, norm])\nval_transforms = transforms.Compose([resize, make_tensor, norm])<\/span><\/pre>\n<h2 id=\"aa7c\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Instantiate dataloaders<\/h2>\n<p id=\"1af7\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Now you can instantiate your dataloaders.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"79db\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">train_dataloader, valid_dataloader, class_names = create_dataloaders(train_dir=config.TRAIN_DIR,\n                                                                     val_dir=config.VAL_DIR,\n                                                                     train_transform=train_transforms,\n                                                                     val_transform=val_transforms,\n                                                                     batch_size=config.BATCH_SIZE)\n\nNUM_CLASSES = <span class=\"hljs-built_in\">len<\/span>(class_names)<\/span><\/pre>\n<h2 id=\"ff34\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Some heuristics for transfer learning<\/h2>\n<p id=\"1acd\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">1) Your dataset is small and similar to the dataset the model was pre-trained on.<\/strong><\/p>\n<p id=\"9e45\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">When your images are similar its likely that low-level features (like edges) and high-level features (like shapes) will be similar.<\/p>\n<p id=\"b8c8\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">What to do: Freeze the weights up to last layer, replace the fully connected layer, and retrain. Why? Less data means you can overfit if you train the entire network.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fi fj fk fl fm\">\n<div class=\"ab ca\">\n<div class=\"ch bg eu ev ew ex\">\n<blockquote class=\"po\"><p id=\"3952\" class=\"pp pq fp be pr ps pt pu pv pw px nt dw\" data-selectable-paragraph=\"\">Comet Artifacts lets you track and reproduce complex multi-experiment scenarios, reuse data points, and easily iterate on datasets. <a class=\"af nu\" href=\"https:\/\/www.comet.com\/site\/blog\/announcing-comet-artifacts\/?utm_source=heartbeat&amp;utm_medium=referral&amp;utm_campaign=AMS_US_EN_AWA_heartbeat_CTA\" target=\"_blank\" rel=\"noopener ugc nofollow\">Read this quick overview of Artifacts<\/a> to explore all that it can do.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fi fj fk fl fm\">\n<div class=\"ab ca\">\n<div class=\"ch bg eu ev ew ex\">\n<p id=\"aa9e\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">2) Your dataset is large and similar to the dataset the model was pre-trained on.<\/strong><\/p>\n<p id=\"ba3c\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">What to do: Freeze the earlier layer weights. Then retain later weights with a new fully connected layer.<\/p>\n<p id=\"ec96\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">3) Your dataset is small and different from the dataset the model was pre-trained on.<\/strong><\/p>\n<p id=\"1f8e\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">This is the most difficult situation to deal with. The pre-trained network is already finely-tuned at each layer. You don\u2019t want any of the high-level features <strong class=\"be nv\"><em class=\"nw\">and<\/em><\/strong> you <strong class=\"be nv\">can\u2019t<\/strong> afford to retrain because you run the risk of overfitting.<\/p>\n<p id=\"241d\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">What to do: Remove the fully connected layers and convolutional layers closer to the output. Retrain the convolutional layers closer to the input.<\/p>\n<p id=\"6a93\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">4) Your dataset is large and different from the dataset the model was pre-trained on.<\/strong><\/p>\n<p id=\"8584\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You should instantiate the pre-trained model weights to speed up training (lot\u2019s of the low-level convolutions will have similar weights).<\/p>\n<p id=\"1c81\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">What to do: Retrain the entire network from scratch, making sure to replace the fully connected output layer.<\/p>\n<p id=\"febb\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">To train ResNet from scratch in SG all you need to do is omit the <code class=\"cw ra rb rc qr b\">pretrained_weights=\"imagenet\"<\/code> argument from the <code class=\"cw ra rb rc qr b\">models.get<\/code> method.<\/p>\n<p id=\"c436\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\"><strong class=\"be nv\">For this example, we will go with option 2.<\/strong><\/p>\n<h2 id=\"65bd\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Instantiate your ResNet model<\/h2>\n<p id=\"8477\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Using SuperGradients makes changing the classification head of your model simple. All you have to do is pass in the number of classes for your use case to the <code class=\"cw ra rb rc qr b\">num_classes<\/code> argument and the classification head will automatically be changed for you.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"6977\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">resnet50_imagenet_model = models.get(model_name=config.MODEL_NAME, num_classes=NUM_CLASSES, pretrained_weights=<span class=\"hljs-string\">\"imagenet\"<\/span>)<\/span><\/pre>\n<p id=\"e522\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">This next block of code will freeze the early layers and batch norm layers of the instantiated model.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"edbc\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">for<\/span> param <span class=\"hljs-keyword\">in<\/span> resnet50_imagenet_model.conv1.parameters():\n    param.requires_grad = <span class=\"hljs-literal\">False<\/span>\n\n<span class=\"hljs-keyword\">for<\/span> param <span class=\"hljs-keyword\">in<\/span> resnet50_imagenet_model.bn1.parameters():\n    param.requires_grad = <span class=\"hljs-literal\">False<\/span>\n\n<span class=\"hljs-keyword\">for<\/span> param <span class=\"hljs-keyword\">in<\/span> resnet50_imagenet_model.layer1.parameters():\n    param.requires_grad = <span class=\"hljs-literal\">False<\/span>\n\n<span class=\"hljs-keyword\">for<\/span> param <span class=\"hljs-keyword\">in<\/span> resnet50_imagenet_model.layer2.parameters():\n    param.requires_grad = <span class=\"hljs-literal\">False<\/span><\/span><\/pre>\n<h2 id=\"365d\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Training setup<\/h2>\n<p id=\"8788\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">The training parameters in SuperGradients were optimized per dataset and architecture with consideration for the type of training used(from scratch\\transfer learning).<\/p>\n<p id=\"06d5\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">For more recommended training params you can have a look at our recipes <a class=\"af nu\" href=\"https:\/\/github.com\/Deci-AI\/super-gradients\/tree\/master\/src\/super_gradients\/recipes\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n<p id=\"180f\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You won\u2019t need to tune hyperparamters in this example, but you are more than welcome to play around with some!<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"2501\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">training_params =  training_hyperparams.get(<span class=\"hljs-string\">\"training_hyperparams\/imagenet_resnet50_train_params\"<\/span>)\n\n<span class=\"hljs-comment\">#overriding the number of epochs to train for<\/span>\ntraining_params[<span class=\"hljs-string\">\"max_epochs\"<\/span>] = <span class=\"hljs-number\">3<\/span><\/span><\/pre>\n<h2 id=\"893c\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">Training and evaluation<\/h2>\n<p id=\"78fc\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">The results of the training epochs are kept in your CKPT path that was defined. Let\u2019s go ahead and train the model.<\/p>\n<p id=\"5ab5\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">You\u2019ll notice a few metrics displayed for each epoch:<\/p>\n<ol class=\"\">\n<li id=\"2848\" class=\"mz na fp be b gn nb nc nd gq ne nf ng nh rd nj nk nl re nn no np rf nr ns nt rg rh ri bj\" data-selectable-paragraph=\"\"><code class=\"cw ra rb rc qr b\">Accuracy<\/code><\/li>\n<li id=\"5343\" class=\"mz na fp be b gn rj nc nd gq rk nf ng nh rl nj nk nl rm nn no np rn nr ns nt rg rh ri bj\" data-selectable-paragraph=\"\"><code class=\"cw ra rb rc qr b\">LabelSmoothingCrossEntropyLoss<\/code><\/li>\n<li id=\"6ab8\" class=\"mz na fp be b gn rj nc nd gq rk nf ng nh rl nj nk nl rm nn no np rn nr ns nt rg rh ri bj\" data-selectable-paragraph=\"\"><code class=\"cw ra rb rc qr b\">Top5<\/code><\/li>\n<\/ol>\n<p id=\"45ee\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Let\u2019s define the two that you may not be familiar with.<\/p>\n<h2 id=\"45d3\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\">LabelSmoothingCrossEntropyLoss.<\/h2>\n<p id=\"3708\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">In classification problems, sometimes our model learns predict the training examples extremely confidently. This is not good for generalization. Label smoothing is a regularization technique for classification problems that prevent a model from predicting the training examples too confidently. This is used to increase robustness for classification problems.<\/p>\n<h2 id=\"42d9\" class=\"nx ny fp be nz oa ob oc od oe of og oh nh oi oj ok nl ol om on np oo op oq or bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Top5<\/strong><\/h2>\n<p id=\"d43a\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Top-5 accuracy means one of the model\u2019s top 5 highest probability predictions match the ground truth. If it does, you count it as a correct prediction.<\/p>\n<p id=\"a892\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">To train your model all you have to do is run the following code:<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"8f14\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">trainer.train(model=resnet50_imagenet_model,\n              training_params=training_params,\n              train_loader=train_dataloader,\n              valid_loader=valid_dataloader)\n\n<span class=\"hljs-comment\"># Load the best model that we trained<\/span>\nbest_model = models.get(config.MODEL_NAME,\n                        num_classes=NUM_CLASSES,\n                        checkpoint_path=os.path.join(trainer.checkpoints_dir_path,<span class=\"hljs-string\">\"ckpt_best.pth\"<\/span>))<\/span><\/pre>\n<h1 id=\"2c26\" class=\"py ny fp be nz pz ro gp od qb rp gs oh qd rq qf qg qh rr qj qk ql rs qn qo qp bj\" data-selectable-paragraph=\"\">Let\u2019s see how well our model predicts on the validation data<\/h1>\n<p id=\"560a\" class=\"pw-post-body-paragraph mz na fp be b gn oz nc nd gq pa nf ng nh pb nj nk nl pc nn no np pd nr ns nt fi bj\" data-selectable-paragraph=\"\">Now that the model is trained you can examine how well it predicts on validation data. This next block of code will predict on images from the validation set.<\/p>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"0c7a\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># 1. Take in a trained model, class names, image path, image size, a transform and target device<\/span>\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title.function\">pred_and_plot_image<\/span>(<span class=\"hljs-params\">model: torch.nn.Module,\n                        image_path: <span class=\"hljs-built_in\">str<\/span>,\n                        class_names: <span class=\"hljs-type\">List<\/span>[<span class=\"hljs-built_in\">str<\/span>],\n                        image_size: <span class=\"hljs-type\">Tuple<\/span>[<span class=\"hljs-built_in\">int<\/span>, <span class=\"hljs-built_in\">int<\/span>] = (config.INPUT_HEIGHT, config.INPUT_WIDTH),\n                        transform: torchvision.transforms = <span class=\"hljs-literal\">None<\/span>,\n                        device: torch.device=config.DEVICE<\/span>):\n\n\n    <span class=\"hljs-comment\"># 2. Open image<\/span>\n    <span class=\"hljs-keyword\">if<\/span> <span class=\"hljs-built_in\">isinstance<\/span>(image_path, pathlib.PosixPath):\n      img = Image.<span class=\"hljs-built_in\">open<\/span>(image_path)\n    <span class=\"hljs-keyword\">else<\/span>:\n      img = Image.<span class=\"hljs-built_in\">open<\/span>(requests.get(image_path, stream=<span class=\"hljs-literal\">True<\/span>).raw)\n\n    <span class=\"hljs-comment\"># 3. Create transformation for image (if one doesn't exist)<\/span>\n    <span class=\"hljs-keyword\">if<\/span> transform <span class=\"hljs-keyword\">is<\/span> <span class=\"hljs-keyword\">not<\/span> <span class=\"hljs-literal\">None<\/span>:\n        image_transform = transform\n    <span class=\"hljs-keyword\">else<\/span>:\n        image_transform = transforms.Compose([\n            transforms.Resize(image_size),\n            transforms.ToTensor(),\n            transforms.Normalize(mean=config.IMAGENET_MEAN,\n                                 std=config.IMAGENET_STD),\n        ])\n\n    <span class=\"hljs-comment\">### Predict on image ### <\/span>\n\n    <span class=\"hljs-comment\"># 4. Make sure the model is on the target device<\/span>\n    model.to(device)\n\n    <span class=\"hljs-comment\"># 5. Turn on model evaluation mode and inference mode<\/span>\n    model.<span class=\"hljs-built_in\">eval<\/span>()\n    <span class=\"hljs-keyword\">with<\/span> torch.inference_mode():\n      <span class=\"hljs-comment\"># 6. Transform and add an extra dimension to image (model requires samples in [batch_size, color_channels, height, width])<\/span>\n      transformed_image = image_transform(img).unsqueeze(dim=<span class=\"hljs-number\">0<\/span>)\n\n      <span class=\"hljs-comment\"># 7. Make a prediction on image with an extra dimension and send it to the target device<\/span>\n      target_image_pred = model(transformed_image.to(device))\n\n    <span class=\"hljs-comment\"># 8. Convert logits -&gt; prediction probabilities (using torch.softmax() for multi-class classification)<\/span>\n    target_image_pred_probs = torch.softmax(target_image_pred, dim=<span class=\"hljs-number\">1<\/span>)\n\n    <span class=\"hljs-comment\"># 9. Convert prediction probabilities -&gt; prediction labels<\/span>\n    target_image_pred_label = torch.argmax(target_image_pred_probs, dim=<span class=\"hljs-number\">1<\/span>)\n\n    <span class=\"hljs-comment\">#actual label<\/span>\n    ground_truth = PurePath(image_path).parent.name\n\n    <span class=\"hljs-comment\"># 10. Plot image with predicted label and probability <\/span>\n    plt.figure()\n    plt.imshow(img)\n    <span class=\"hljs-keyword\">if<\/span> <span class=\"hljs-built_in\">isinstance<\/span>(image_path, pathlib.PosixPath):\n      plt.title(<span class=\"hljs-string\">f\"Ground Truth: <span class=\"hljs-subst\">{ground_truth}<\/span> | Pred: <span class=\"hljs-subst\">{class_names[target_image_pred_label]}<\/span> | Prob: <span class=\"hljs-subst\">{target_image_pred_probs.<span class=\"hljs-built_in\">max<\/span>():<span class=\"hljs-number\">.3<\/span>f}<\/span>\"<\/span>)\n    <span class=\"hljs-keyword\">else<\/span>:\n      plt.title(<span class=\"hljs-string\">f\"Pred: <span class=\"hljs-subst\">{class_names[target_image_pred_label]}<\/span> | Prob: <span class=\"hljs-subst\">{target_image_pred_probs.<span class=\"hljs-built_in\">max<\/span>():<span class=\"hljs-number\">.3<\/span>f}<\/span>\"<\/span>)\n    plt.axis(<span class=\"hljs-literal\">False<\/span>);\n\n<span class=\"hljs-comment\"># Get a random list of image paths from test set<\/span>\nnum_images_to_plot = <span class=\"hljs-number\">30<\/span>\ntest_image_path_list = <span class=\"hljs-built_in\">list<\/span>(Path(config.VAL_DIR).glob(<span class=\"hljs-string\">\"*\/*.jpg\"<\/span>)) <span class=\"hljs-comment\"># get list all image paths from test data <\/span>\ntest_image_path_sample = random.sample(population=test_image_path_list, <span class=\"hljs-comment\"># go through all of the test image paths<\/span>\n                                       k=num_images_to_plot) <span class=\"hljs-comment\"># randomly select 'k' image paths to pred and plot<\/span>\n\n<span class=\"hljs-comment\"># Make predictions on and plot the images<\/span>\n<span class=\"hljs-keyword\">for<\/span> image_path <span class=\"hljs-keyword\">in<\/span> test_image_path_sample:\n    pred_and_plot_image(model=best_model,\n                        image_path=image_path,\n                        class_names=class_names,\n                        image_size=(config.INPUT_HEIGHT, config.INPUT_WIDTH))<\/span><\/pre>\n<p id=\"ead6\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Now that the training is complete you can use the trained model to predict on an unseen image!<\/p>\n<p id=\"aa21\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">Let\u2019s see what place the model thinks this image is:<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:633\/1*-5G8di6uomOhjo38GwMVdg.jpeg\" alt=\"\" width=\"633\" height=\"356\"><\/figure><div class=\"mf mg rt\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1266\/format:webp\/1*-5G8di6uomOhjo38GwMVdg.jpeg 1266w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 633px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*-5G8di6uomOhjo38GwMVdg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*-5G8di6uomOhjo38GwMVdg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*-5G8di6uomOhjo38GwMVdg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*-5G8di6uomOhjo38GwMVdg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*-5G8di6uomOhjo38GwMVdg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*-5G8di6uomOhjo38GwMVdg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1266\/1*-5G8di6uomOhjo38GwMVdg.jpeg 1266w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 633px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<pre class=\"mi mj mk ml mm qq qr qs bo qt ba bj\"><span id=\"eec3\" class=\"qu ny fp qr b bf qv qw l qx qy\" data-selectable-paragraph=\"\">pred_and_plot_image(model=best_model,\n                    image_path=<span class=\"hljs-string\">\"https:\/\/cdn.pastemagazine.com\/www\/articles\/2021\/05\/18\/the-office-NEW.jpg\"<\/span>,\n                    class_names=train_dataloader.dataset.classes,\n                    image_size=(config.INPUT_HEIGHT, config.INPUT_WIDTH))<\/span><\/pre>\n<p id=\"d13f\" class=\"pw-post-body-paragraph mz na fp be b gn nb nc nd gq ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt fi bj\" data-selectable-paragraph=\"\">If you enjoyed this tutorial and think <a class=\"af nu\" href=\"http:\/\/bit.ly\/sg-harp-medium-post\" target=\"_blank\" rel=\"noopener ugc nofollow\">SuperGradients<\/a> is a cool tool to use, then consider giving it a <a class=\"af nu\" href=\"http:\/\/bit.ly\/sg-harp-medium-post\" target=\"_blank\" rel=\"noopener ugc nofollow\">star on on GitHub<\/a>. Looking for a community of deep learning practitioners? Then hang out at the <a class=\"af nu\" href=\"https:\/\/www.deeplearningdaily.community\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Deep Learning Daily<\/a> community, where deep learning practitioners come to learn new skills and solve their most difficult problems.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The ResNet Paper In December of 2015 a paper was published that rocked the deep learning world. Widely regarded as one of the most influential papers in modern deep learning, it has been cited over 110,000 times. The name of this paper will go down in the annals of deep learning history: Deep Residual Learning [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[135],"class_list":["post-7860","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>ResNet: How One Paper Changed Deep Learning Forever - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ResNet: How One Paper Changed Deep Learning Forever\" \/>\n<meta property=\"og:description\" content=\"The ResNet Paper In December of 2015 a paper was published that rocked the deep learning world. Widely regarded as one of the most influential papers in modern deep learning, it has been cited over 110,000 times. The name of this paper will go down in the annals of deep learning history: Deep Residual Learning [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-06T22:35:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:05:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"ResNet: How One Paper Changed Deep Learning Forever - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/","og_locale":"en_US","og_type":"article","og_title":"ResNet: How One Paper Changed Deep Learning Forever","og_description":"The ResNet Paper In December of 2015 a paper was published that rocked the deep learning world. Widely regarded as one of the most influential papers in modern deep learning, it has been cited over 110,000 times. The name of this paper will go down in the annals of deep learning history: Deep Residual Learning [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-10-06T22:35:37+00:00","article_modified_time":"2025-04-24T17:05:48+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png","type":"","width":"","height":""}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"ResNet: How One Paper Changed Deep Learning Forever","datePublished":"2023-10-06T22:35:37+00:00","dateModified":"2025-04-24T17:05:48+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/"},"wordCount":1895,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/","url":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/","name":"ResNet: How One Paper Changed Deep Learning Forever - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png","datePublished":"2023-10-06T22:35:37+00:00","dateModified":"2025-04-24T17:05:48+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Hb4l_paTxs7mFuLENeR3HQ.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/resnet-how-one-paper-changed-deep-learning-forever\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"ResNet: How One Paper Changed Deep Learning Forever"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7860","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7860"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7860\/revisions"}],"predecessor-version":[{"id":15509,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7860\/revisions\/15509"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7860"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7860"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7860"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}