{"id":4595,"date":"2022-11-10T17:47:33","date_gmt":"2022-11-11T01:47:33","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4595"},"modified":"2025-04-24T17:16:41","modified_gmt":"2025-04-24T17:16:41","slug":"4-techniques-to-tackle-overfitting-in-deep-neural-networks","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/","title":{"rendered":"4 Techniques To Tackle Overfitting In Deep Neural Networks"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">Image Created By Author Using <a class=\"au lc\" href=\"http:\/\/canva.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm ld\">Canva<\/strong><\/a><\/p>\n\n\n\n<div class=\"ir is it iu iv\">\n<p id=\"9e7e\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">A neural network is a combination of different neurons, layers, weights, and biases. The first neural network was created in 1957 and named&nbsp;<a class=\"au lc\" href=\"https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.335.3398&amp;rep=rep1&amp;type=pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm ld\"><em class=\"lz\">perceptron<\/em><\/strong><\/a>. It is similar to modern-day neural networks but it only had one layer. Since then, neural networks have become widely used for making predictions and business decisions.<\/p>\n<p id=\"2625\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Neural networks became famous because of their ability to beat any traditional machine learning algorithm by a long margin in terms of performance. After the first neural network, they have evolved so much that nowadays there are networks with tons of layers and billions of parameters.<\/p>\n<p id=\"0dc8\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">With the involvement of APIs like Keras, Tensorflow, and Pytorch it has become very easy to design a neural network to achieve good accuracy on any kind and size of data. Being so powerful in terms of architecture, neural networks, if not created properly, can fall into the problem of overfitting.<\/p>\n<p id=\"1a3f\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Overfitting is a condition that occurs when a model performs significantly better for training data than it does for new data. In this blog, we will see some of the techniques that are helpful for tackling overfitting in neural networks.<\/p>\n<h1 id=\"5a9e\" class=\"ma mb iy bm mc md me mf mg mh mi mj mk ke ml kf mm kh mn ki mo kk mp kl mq mr ga\" data-selectable-paragraph=\"\">Data Augmentation<\/h1>\n<p id=\"b6af\" class=\"pw-post-body-paragraph le lf iy bm b lg ms jz li lj mt kc ll lm mu lo lp lq mv ls lt lu mw lw lx ly ir ga\" data-selectable-paragraph=\"\">The simplest way to eliminate overfitting is&nbsp;<a class=\"au lc\" href=\"https:\/\/heartbeat.comet.ml\/research-guide-data-augmentation-for-deep-learning-7f141fcc191c\" target=\"_blank\" rel=\"noopener ugc nofollow\">data augmentation<\/a>. Data augmentation is the process of reproducing new training instances from existing ones. It is commonly used in computer vision to regenerate images for convolutional neural networks.<\/p>\n<p id=\"6d88\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\"><a class=\"au lc\" href=\"https:\/\/heartbeat.comet.ml\/image-augmentations-with-albumentations-c1ca8fc78db7\" target=\"_blank\" rel=\"noopener ugc nofollow\">Image augmentation<\/a>&nbsp;is a type of data augmentation technique in which we apply certain transformations to our images to produce multiple copies of the original image that are totally different in terms of size and colors from the original image.<\/p>\n<p id=\"4350\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Position and Color are the two most common transformation techniques that are used to reproduce data. Position augmentation changes the pixel position of an image. Some famous techniques to do so are \u2014 scaling, cropping, flipping, padding, rotation, and translation at different positional values.<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"37a0\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">from PIL import Image\nimport matplotlib.pyplot as plt<\/span><span id=\"227d\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">img = Image.open(\"\/content\/drive\/MyDrive\/cat.jpg\")\nflipped_img = img.transpose(Image.FLIP_LEFT_RIGHT) ###flipping\nroated_img = img.transpose(Image.ROTATE_90) ## rotating\nscaled_img = img.resize((400, 400))  ### scaling<\/span><span id=\"d415\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">##left, upper, right, lower\ncropped_img = img.crop((100,50,400,200))<\/span><span id=\"32ac\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">width, height = img.size\npad_pixel = 20\ncanvas = Image.new(img.mode, (width+pad_pixel, height+pad_pixel), 'blue')\ncanvas.paste(img, (pad_pixel\/\/2,pad_pixel\/\/2))<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*v8tvQcAPy9VHSN2gE-Xo5A.png\" alt=\"\" width=\"700\" height=\"494\"><\/figure><div class=\"gl gm nh\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 640w, https:\/\/miro.medium.com\/max\/720\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 720w, https:\/\/miro.medium.com\/max\/750\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 750w, https:\/\/miro.medium.com\/max\/786\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 786w, https:\/\/miro.medium.com\/max\/828\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*v8tvQcAPy9VHSN2gE-Xo5A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Position Augmentation<\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"d6c0\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Color Augmentation deals with altering the color properties of an image by changing its pixel values. Some of the most common techniques of it are changing brightness, contrast, saturation, hue, grayscale, dilation, etc.<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"2260\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">from PIL import Image, ImageEnhance\nimport matplotlib.pyplot as plt\nimg = Image.open(\"\/content\/drive\/MyDrive\/cat.jpg\")<\/span><span id=\"51f0\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">enhancer = ImageEnhance.Brightness(img)\nimg2 = enhancer.enhance(1.5)  ## brightens image\nimg3 = enhancer.enhance(0.5)  ## darkens image<\/span><span id=\"b9d5\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">enhancer = ImageEnhance.Contrast(img)\nimg4 = enhancer.enhance(1.5) ## increase contrast\nimg5 = enhancer.enhance(0.5) ## decrease contrast<\/span><span id=\"2d69\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">enhancer = ImageEnhance.Sharpness(img)\nimg6 = enhancer.enhance(5) ## increase sharpness<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*eu_H2jcDvsJrI4fnO893RQ.png\" alt=\"\" width=\"700\" height=\"415\"><\/figure><div class=\"gl gm ni\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*eu_H2jcDvsJrI4fnO893RQ.png 640w, https:\/\/miro.medium.com\/max\/720\/1*eu_H2jcDvsJrI4fnO893RQ.png 720w, https:\/\/miro.medium.com\/max\/750\/1*eu_H2jcDvsJrI4fnO893RQ.png 750w, https:\/\/miro.medium.com\/max\/786\/1*eu_H2jcDvsJrI4fnO893RQ.png 786w, https:\/\/miro.medium.com\/max\/828\/1*eu_H2jcDvsJrI4fnO893RQ.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*eu_H2jcDvsJrI4fnO893RQ.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*eu_H2jcDvsJrI4fnO893RQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"1daa\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Although it is possible to perform image augmentation manually by using image processing libraries like pillow and OpenCV, the much simpler and less time-consuming way is to do it by using the&nbsp;<a class=\"au lc\" href=\"https:\/\/keras.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Keras<\/a>&nbsp;API.<\/p>\n<p id=\"912b\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Keras is a deep learning API written in Python, running on top of the machine learning platform Tensorflow. Keras has many inbuild methods and classes that increase the experimentation speed. In Keras, inside the image class, we have a method&nbsp;<a class=\"au lc\" href=\"https:\/\/keras.io\/ja\/preprocessing\/image\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">ImageDataGenerator<\/a>&nbsp;that provides multiple options for Image Augmentation.<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"fd4c\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">keras.preprocessing.image.<strong class=\"na iz\">ImageDataGenerator<\/strong>()<\/span><\/pre>\n<p id=\"692b\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\"><a class=\"au lc\" href=\"https:\/\/keras.io\/ja\/preprocessing\/image\/#:~:text=number%20of%20batches.-,argument,-featurewise_center%20%3A%20Truth%20value\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm ld\">arguments<\/strong><\/a>:<br>\n1.&nbsp;<code class=\"fp nj nk nl na b\">featurewise_center<\/code>: Truth value. Average the inputs to 0 for the entire dataset.<br>\n2.&nbsp;<code class=\"fp nj nk nl na b\">samplewise_center<\/code>: Truth value. Set the average of each sample to 0.<br>\n3.&nbsp;<code class=\"fp nj nk nl na b\">featurewise_std_normalization<\/code>: Truth value. Normalizes the input with the standard deviation of the dataset.<br>\n4.&nbsp;<code class=\"fp nj nk nl na b\">samplewise_std_normalization<\/code>: Truth value. Normalize each input with its standard deviation.<br>\n5.&nbsp;<code class=\"fp nj nk nl na b\">zca_epsilon<\/code>: ZCA whitening epsilon. The default is 1e-6.<br>\n6.&nbsp;<code class=\"fp nj nk nl na b\">zca_whitening<\/code>: Truth value. Apply ZCA whitening.<br>\n7.&nbsp;<code class=\"fp nj nk nl na b\">rotation_range<\/code>: Integer. A rotation range that randomly rotates the image.<br>\n8.&nbsp;<code class=\"fp nj nk nl na b\">width_shift_range<\/code>: Floating point number (ratio to width). Random horizontal shift range.<br>\n9.&nbsp;<code class=\"fp nj nk nl na b\">height_shift_range<\/code>: Floating point number (ratio to vertical width). Random vertical shift range.<br>\n10.&nbsp;<code class=\"fp nj nk nl na b\">shear_range<\/code>: Floating point number. Shear strength (counterclockwise shear angle).<br>\n11.&nbsp;<code class=\"fp nj nk nl na b\">zoom_range<\/code>: Floating point number or [lower, upper]. Random zoom range. Given a floating point number&nbsp;<code class=\"fp nj nk nl na b\">[lower, upper] = [1-zoom_range, 1+zoom_range]<\/code>:<br>\n12.&nbsp;<code class=\"fp nj nk nl na b\">channel_shift_range<\/code>: Floating point number. The range in which the channel is randomly shifted.<br>\n13.&nbsp;<code class=\"fp nj nk nl na b\">horizontal_flip<\/code>: Truth value. Randomly inverts the input horizontally.<br>\n14.&nbsp;<code class=\"fp nj nk nl na b\">vertical_flip<\/code>: Truth value. Randomly inverts the input in the vertical direction.<br>\n15.<strong class=\"bm ld\">&nbsp;<\/strong><code class=\"fp nj nk nl na b\">rescale<\/code>: Pixel value rescale factor. The default is None. If None or 0, it does not apply.<br>\n16.&nbsp;<code class=\"fp nj nk nl na b\">preprocessing_function<\/code>: Function applied to each input. This function will be executed before any other changes are made.<br>\n17.&nbsp;<code class=\"fp nj nk nl na b\">validation_split<\/code><strong class=\"bm ld\">:&nbsp;<\/strong>Floating point number. Percentage of images reserved for verification (strictly between 0 and 1).<br>\n18.&nbsp;<code class=\"fp nj nk nl na b\">fill_mode<\/code>: {\u201cconstant\u201d, \u201cnearest\u201d, \u201creflect\u201d, \u201cwrap\u201d} The default is \u2018nearest\u2019. Fills around the boundaries of the input image according to the specified mode.<br>\n19.&nbsp;<code class=\"fp nj nk nl na b\">cval<\/code>: Floating point number or integer.&nbsp;<code class=\"fp nj nk nl na b\">fill_mode = \"constant\"<\/code>The value used around the boundary at.<\/p>\n<h2 id=\"8b2d\" class=\"nb mb iy bm mc nm nn no mg np nq nr mk lm ns nt mm lq nu nv mo lu nw nx mq ny ga\" data-selectable-paragraph=\"\">Performing Data Augmentation using Tensorflow<\/h2>\n<pre>from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img\n\ndatagen = ImageDataGenerator(\n        rotation_range=40,\n        width_shift_range=0.2,\n        height_shift_range=0.2,\n        brightness_range= [0.5, 1.5],\n        rescale=1.\/255,\n        shear_range=0.2,\n        zoom_range=0.4,\n        horizontal_flip=True,\n        fill_mode='nearest',\n        zca_epsilon=True)\n\npath = '\/content\/drive\/MyDrive\/cat.jpg' ## Image Path\nimg = load_img(f\"{path}\")\nx = img_to_array(img)\nx = x.reshape((1,) + x.shape)\ni = 0\n\n### Create 25 Augmentated Images and Save Them In `aug_img` directory\nfor batch in datagen.flow(x, batch_size=1,\n                      save_to_dir=\"\/content\/drive\/MyDrive\/aug_imgs\", save_prefix='img', save_format='jpeg'):\n    i += 1\n    if i &gt; 25:   ## Total 25 Augmented Images\n        break<\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*YeeocFC1XrC2PXdcZlBE5Q.png\" alt=\"\" width=\"700\" height=\"492\"><\/figure><div class=\"gl gm nh\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*YeeocFC1XrC2PXdcZlBE5Q.png 640w, https:\/\/miro.medium.com\/max\/720\/1*YeeocFC1XrC2PXdcZlBE5Q.png 720w, https:\/\/miro.medium.com\/max\/750\/1*YeeocFC1XrC2PXdcZlBE5Q.png 750w, https:\/\/miro.medium.com\/max\/786\/1*YeeocFC1XrC2PXdcZlBE5Q.png 786w, https:\/\/miro.medium.com\/max\/828\/1*YeeocFC1XrC2PXdcZlBE5Q.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*YeeocFC1XrC2PXdcZlBE5Q.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*YeeocFC1XrC2PXdcZlBE5Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"3779\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">There are many advanced methods as well that make use of&nbsp;<a class=\"au lc\" href=\"https:\/\/medium.com\/cometheartbeat\/introduction-to-generative-adversarial-networks-gans-35ef44f21193\" rel=\"noopener\">Generative Adversarial Networks (GANs)<\/a>&nbsp;to perform data augmentation. You can read about these methods by reading these research papers:<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"1b5e\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">1. <a class=\"au lc\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2352914821002501\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"lz\">Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images<\/em><\/a>\n2. <a class=\"au lc\" href=\"https:\/\/arxiv.org\/abs\/1711.04340\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"lz\">Data Augmentation Generative Adversarial Networks<\/em><\/a><\/span><\/pre>\n<\/div>\n\n\n\n<div class=\"o dx ob oc id od\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"a5e2\" class=\"ma mb iy bm mc md oi mf mg mh oj mj mk ke ok kf mm kh ol ki mo kk om kl mq mr ga\" data-selectable-paragraph=\"\">Adding Dropout Layers<\/h1>\n<p id=\"b7db\" class=\"pw-post-body-paragraph le lf iy bm b lg ms jz li lj mt kc ll lm mu lo lp lq mv ls lt lu mw lw lx ly ir ga\" data-selectable-paragraph=\"\">Dropout layers are the most common method to tackle overfitting in deep neural networks. It reduces the chances of overfitting by modifying the network.<\/p>\n<p id=\"3ec6\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Dropout layers&nbsp;<strong class=\"bm ld\">randomly<\/strong>&nbsp;set input units to 0 with a frequency of&nbsp;<em class=\"lz\">rate<\/em>&nbsp;at each step during the training phase. These inputs with 0 frequency are dropped for the same training epoch. Inputs not set to 0 are scaled up by 1\/(1 \u2014 rate) such that the sum over all inputs is unchanged.<\/p>\n<p id=\"40d5\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm ld\">Rate&nbsp;<\/strong>is the main parameter that ranges from 0 to 1. It specifies the fraction of the input units to drop. For example, the rate of 0.5 means 50% of units (neurons) are dropped randomly from the network.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*UjgkmPF_O5-dYkgrdfxQSw.png\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"gl gm kn\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*UjgkmPF_O5-dYkgrdfxQSw.png 640w, https:\/\/miro.medium.com\/max\/720\/1*UjgkmPF_O5-dYkgrdfxQSw.png 720w, https:\/\/miro.medium.com\/max\/750\/1*UjgkmPF_O5-dYkgrdfxQSw.png 750w, https:\/\/miro.medium.com\/max\/786\/1*UjgkmPF_O5-dYkgrdfxQSw.png 786w, https:\/\/miro.medium.com\/max\/828\/1*UjgkmPF_O5-dYkgrdfxQSw.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*UjgkmPF_O5-dYkgrdfxQSw.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*UjgkmPF_O5-dYkgrdfxQSw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"3bcb\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">tf.keras.layers..Dropout(<strong class=\"na iz\">rate<\/strong>, noise_shape=None, seed=None)<\/span><\/pre>\n<p id=\"d1ae\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\"><a class=\"au lc\" href=\"https:\/\/keras.io\/api\/layers\/regularization_layers\/dropout\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm ld\">arguments<\/strong><\/a><strong class=\"bm ld\"><br>\n1.&nbsp;<\/strong><code class=\"fp nj nk nl na b\"><strong class=\"bm ld\">rate<\/strong><\/code><strong class=\"bm ld\">&nbsp;:&nbsp;<\/strong>Float between 0 and 1. Fraction of the input units to drop.<br>\n2.&nbsp;<code class=\"fp nj nk nl na b\">noise_shape<\/code>&nbsp;:1D integer tensor representing the shape of the binary dropout mask that will be multiplied with the input.<br>\n3.&nbsp;<code class=\"fp nj nk nl na b\">seed<\/code>&nbsp;: Integer to use as a random seed.<\/p>\n<h2 id=\"7814\" class=\"nb mb iy bm mc nm nn no mg np nq nr mk lm ns nt mm lq nu nv mo lu nw nx mq ny ga\" data-selectable-paragraph=\"\">Adding Dropout Layers Using Tensorflow<\/h2>\n<pre>import tensorflow as tf\nfrom tensorflow.keras.models import Sequential\nfrom tensorflow.keras.layers import Dense,Reshape\nfrom tensorflow.keras.layers import Dropout\n\ndef create_model():\n  model = Sequential()\n  model.add(Dense(60, input_shape=(60,), activation='relu'))\n  model.add(Dropout(0.2))\n  model.add(Dense(30, activation='relu'))\n  model.add(Dropout(0.2))\n  model.add(Dense(1, activation='sigmoid'))\n  return model\n\nadam = tf.keras.optimizers.Adam()\nmodel.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])\n\nmodel = create_model()\nmodel.summary()<\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/443\/1*HQUCoIkt2U_xqzeirc25fg.png\" alt=\"\" width=\"443\" height=\"282\"><\/figure><div class=\"gl gm on\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*HQUCoIkt2U_xqzeirc25fg.png 640w, https:\/\/miro.medium.com\/max\/720\/1*HQUCoIkt2U_xqzeirc25fg.png 720w, https:\/\/miro.medium.com\/max\/750\/1*HQUCoIkt2U_xqzeirc25fg.png 750w, https:\/\/miro.medium.com\/max\/786\/1*HQUCoIkt2U_xqzeirc25fg.png 786w, https:\/\/miro.medium.com\/max\/828\/1*HQUCoIkt2U_xqzeirc25fg.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*HQUCoIkt2U_xqzeirc25fg.png 1100w, https:\/\/miro.medium.com\/max\/886\/1*HQUCoIkt2U_xqzeirc25fg.png 886w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 443px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"a3dd\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">There are certain tips that you should consider while adding dropout layers to your neural network:<\/p>\n<p id=\"b4e0\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">1. Use a small dropout value of 20%-50% of neurons. Specifying a larger dropout value might decrease the model performance also choosing a very small value will not affect the network much.<\/p>\n<p id=\"d97e\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">2. Try to dropout layer only in a large network to get maximum performance.<\/p>\n<p id=\"d735\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">3. You can use dropout on incoming (visible) and hidden layers both. It performs well in both cases.<\/p>\n<\/div>\n\n\n\n<div class=\"o dx ob oc id od\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<blockquote class=\"oo\"><p id=\"07df\" class=\"op oq iy bm or os ot ou ov ow ox ly cn\" data-selectable-paragraph=\"\">Want to get the most up-to-date news on all things Deep Learning?&nbsp;<a class=\"au lc\" href=\"https:\/\/www.deeplearningweekly.com\/about\" target=\"_blank\" rel=\"noopener ugc nofollow\">Subscribe to Deep Learning Weekly<\/a>&nbsp;for the latest research, resources, and industry news, delivered to your inbox.<\/p><\/blockquote>\n<\/div>\n\n\n\n<div class=\"o dx ob oc id od\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"8b04\" class=\"ma mb iy bm mc md oi mf mg mh oj mj mk ke ok kf mm kh ol ki mo kk om kl mq mr ga\" data-selectable-paragraph=\"\">L1 and L2 Regularization<\/h1>\n<p id=\"3657\" class=\"pw-post-body-paragraph le lf iy bm b lg ms jz li lj mt kc ll lm mu lo lp lq mv ls lt lu mw lw lx ly ir ga\" data-selectable-paragraph=\"\">Regularization is a technique to reduce the complexity of the network by penalizing the loss function. It adds an extra element to the loss function, which punishes our model for being too complex or, in simple words, for using high values in the weight matrix.<\/p>\n<p id=\"34c8\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">L1 Regularization reduces the weight values for less important features to zero so that only important features take part in the training and validation process. It works as an automatic feature selector for neural networks. It is also referred to as&nbsp;<em class=\"lz\">Least Absolute Deviations,&nbsp;<\/em>and minimizes the absolute difference between target and estimated(predicted) values.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png\" alt=\"\" width=\"700\" height=\"111\"><\/figure><div class=\"gl gm oy\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 640w, https:\/\/miro.medium.com\/max\/720\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 720w, https:\/\/miro.medium.com\/max\/750\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 750w, https:\/\/miro.medium.com\/max\/786\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 786w, https:\/\/miro.medium.com\/max\/828\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*Pe0sjd-HfU-QiIRzR_7rHQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"d900\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">L2 Regularization forces the weights towards zero but never makes them exactly zero. It removes a small percentage of weight at each iteration to reduce the complexity of the network and make it simple so that it does not overfit on the data. It minimizes the square of the sum of the difference between target values and estimated values.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*0OWgmNB1ePTCCpo-BidJYQ.png\" alt=\"\" width=\"700\" height=\"115\"><\/figure><div class=\"gl gm oz\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*0OWgmNB1ePTCCpo-BidJYQ.png 640w, https:\/\/miro.medium.com\/max\/720\/1*0OWgmNB1ePTCCpo-BidJYQ.png 720w, https:\/\/miro.medium.com\/max\/750\/1*0OWgmNB1ePTCCpo-BidJYQ.png 750w, https:\/\/miro.medium.com\/max\/786\/1*0OWgmNB1ePTCCpo-BidJYQ.png 786w, https:\/\/miro.medium.com\/max\/828\/1*0OWgmNB1ePTCCpo-BidJYQ.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*0OWgmNB1ePTCCpo-BidJYQ.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*0OWgmNB1ePTCCpo-BidJYQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h2 id=\"a2de\" class=\"nb mb iy bm mc nm nn no mg np nq nr mk lm ns nt mm lq nu nv mo lu nw nx mq ny ga\" data-selectable-paragraph=\"\">Performing Regularization Using Tensorflow<\/h2>\n<pre>import tensorflow as tf\nfrom tensorflow.keras.models import Sequential\nfrom tensorflow.keras.layers import Dense,Reshape\nfrom tensorflow.keras.layers import Dropout\n\ndef create_model():\n  # create model\n  model = Sequential()\n  model.add(Dense(60, input_shape=(60,), activation='relu',  kernel_regularizer=keras.regularizers.l1(0.01)))\n  model.add(Dropout(0.2))\n  model.add(Dense(30, activation='relu',  kernel_regularizer=keras.regularizers.l2(0.001)))\n  model.add(Dropout(0.2))\n  model.add(Dense(1, activation='sigmoid'))\n  return model\n\n\nadam = tf.keras.optimizers.Adam()\nmodel.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])\n\nmodel = create_model()\nmodel.summary()<\/pre>\n<p id=\"fd2a\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">You can learn about L1 and L2 Regularization more deeply by reading the below research papers:<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"772d\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">1. <a class=\"au lc\" href=\"https:\/\/ai.stanford.edu\/~ang\/papers\/icml04-l1l2.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"lz\">Feature selection, L1 vs. L2 regularization, and rotational invariance<\/em><\/a>\n2. <a class=\"au lc\" href=\"https:\/\/www.diva-portal.org\/smash\/get\/diva2:1389238\/FULLTEXT01.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"lz\">Regularization Methods in Neural Networks<\/em><\/a>\n3. <a class=\"au lc\" href=\"https:\/\/www.researchgate.net\/publication\/329150256_A_Comparison_of_Regularization_Techniques_in_Deep_Neural_Networks\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"lz\">A Comparison of Regularization Techniques in Deep Neural Networks<\/em><\/a><\/span><\/pre>\n<\/div>\n\n\n\n<div class=\"o dx ob oc id od\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"46bd\" class=\"ma mb iy bm mc md oi mf mg mh oj mj mk ke ok kf mm kh ol ki mo kk om kl mq mr ga\" data-selectable-paragraph=\"\">Early Stopping<\/h1>\n<p id=\"eddc\" class=\"pw-post-body-paragraph le lf iy bm b lg ms jz li lj mt kc ll lm mu lo lp lq mv ls lt lu mw lw lx ly ir ga\" data-selectable-paragraph=\"\">One epoch is one complete pass of training data through the neural network. During each epoch, each neuron has the opportunity to update its weights, so the more epochs you choose, the longer your training will be. Additionally, choosing too many epochs can lead to overfitting. On the other hand, choosing too few epochs can cause underfitting.<\/p>\n<p id=\"9ab9\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Early stopping is a form of regularization that stops the training process once model performance stops improving on the validation set as it significantly decreases the likelihood of overfitting the model.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*LWCGZ-Ppd5TPipOH\" alt=\"\" width=\"700\" height=\"280\"><\/figure><div class=\"gl gm pa\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*LWCGZ-Ppd5TPipOH 640w, https:\/\/miro.medium.com\/max\/720\/0*LWCGZ-Ppd5TPipOH 720w, https:\/\/miro.medium.com\/max\/750\/0*LWCGZ-Ppd5TPipOH 750w, https:\/\/miro.medium.com\/max\/786\/0*LWCGZ-Ppd5TPipOH 786w, https:\/\/miro.medium.com\/max\/828\/0*LWCGZ-Ppd5TPipOH 828w, https:\/\/miro.medium.com\/max\/1100\/0*LWCGZ-Ppd5TPipOH 1100w, https:\/\/miro.medium.com\/max\/1400\/0*LWCGZ-Ppd5TPipOH 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Early Stopping [<\/picture><strong class=\"bm ld\">Must Read<\/strong><picture>&nbsp;]\u2014&nbsp;<\/picture><a class=\"au lc\" href=\"https:\/\/stanford.edu\/~shervine\/teaching\/cs-230\/cheatsheet-deep-learning-tips-and-tricks\" target=\"_blank\" rel=\"noopener ugc nofollow\">SOURCE<\/a><\/div>\n<\/div>\n<\/figure>\n<p id=\"f7f7\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Keras has a callback function designed to stop training early once it has detected that the model is no longer making significant improvements.<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"fb9c\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">from tensorflow.keras.callbacks import EarlyStopping\nearly_stopping = EarlyStopping(\n    monitor='val_loss',\n    min_delta=0,\n    patience=0,\n    verbose=0,\n    mode='auto',\n    baseline=None,\n    restore_best_weights=False\n)<\/span><\/pre>\n<p id=\"c432\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\"><a class=\"au lc\" href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/callbacks\/EarlyStopping\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm ld\">Arguments<\/strong><\/a><strong class=\"bm ld\"><br>\nmonitor<\/strong>: metric to be used as a measure for terminating the training.<br>\n<strong class=\"bm ld\">min_delta<\/strong>:<strong class=\"bm ld\">&nbsp;<\/strong>change in the monitored quantity to qualify as an improvement. less min_delta will count as no improvement.<br>\n<strong class=\"bm ld\">patience<\/strong>: number of epochs with no improvement after which training gets terminated.<br>\n<strong class=\"bm ld\">verbose<\/strong>: Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays callback messages.<br>\n<strong class=\"bm ld\">mode<\/strong>:{\u201cauto\u201d, \u201cmin\u201d, \u201cmax\u201d}, in&nbsp;<code class=\"fp nj nk nl na b\">min<\/code>mode training, will stop when performance stops decreasing, in&nbsp;<code class=\"fp nj nk nl na b\">max<\/code><strong class=\"bm ld\">&nbsp;<\/strong>mode training will stop when performance stops increasing, in&nbsp;<code class=\"fp nj nk nl na b\">\"auto\"<\/code>&nbsp;mode, the direction is automatically inferred from the name of the monitored quantity.<br>\n<strong class=\"bm ld\">baseline<\/strong>: baseline value for the monitored quality.<br>\n<strong class=\"bm ld\">restore_best_weights<\/strong>: Whether to restore model weights from the epoch with the best value of the monitored quantity or not.<\/p>\n<pre class=\"ko kp kq kr gx mx bs my mz dz na\"><span id=\"3899\" class=\"ga nb mb iy na b dm nc nd l ne nf\" data-selectable-paragraph=\"\">from tensorflow.keras.callbacks import EarlyStopping<\/span><span id=\"9805\" class=\"ga nb mb iy na b dm ng nd l ne nf\" data-selectable-paragraph=\"\">early_stopping = EarlyStopping(monitor='loss', patience=2)history = model.fit(\n    X_train,\n    y_train,\n    epochs= 100,\n    validation_split= 0.20,\n    batch_size= 50,\n    verbose= \"auto\",\n    callbacks= [early_stopping]\n)<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/0*i3F54zynASiSxZCG.png\" alt=\"\" width=\"700\" height=\"134\"><\/figure><div class=\"gl gm pb\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*i3F54zynASiSxZCG.png 640w, https:\/\/miro.medium.com\/max\/720\/0*i3F54zynASiSxZCG.png 720w, https:\/\/miro.medium.com\/max\/750\/0*i3F54zynASiSxZCG.png 750w, https:\/\/miro.medium.com\/max\/786\/0*i3F54zynASiSxZCG.png 786w, https:\/\/miro.medium.com\/max\/828\/0*i3F54zynASiSxZCG.png 828w, https:\/\/miro.medium.com\/max\/1100\/0*i3F54zynASiSxZCG.png 1100w, https:\/\/miro.medium.com\/max\/1400\/0*i3F54zynASiSxZCG.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"5821\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">Early stopping will stop the neural network when it stops improving for the specified number of epochs, thus reducing the training time taken by the network.<\/p>\n<p id=\"f196\" class=\"pw-post-body-paragraph le lf iy bm b lg lh jz li lj lk kc ll lm ln lo lp lq lr ls lt lu lv lw lx ly ir ga\" data-selectable-paragraph=\"\">These methods only work when you apply them correctly. Not every network requires each technique to be applied, some networks can be improved by applying just one method.<\/p>\n<h2 id=\"cfb0\" class=\"nb mb iy bm mc nm nn no mg np nq nr mk lm ns nt mm lq nu nv mo lu nw nx mq ny ga\" data-selectable-paragraph=\"\">Conclusion<\/h2>\n<p id=\"a4eb\" class=\"pw-post-body-paragraph le lf iy bm b lg ms jz li lj mt kc ll lm mu lo lp lq mv ls lt lu mw lw lx ly ir ga\" data-selectable-paragraph=\"\">As a quick recap of different techniques, data augmentation will increase the size of data by applying different transformations to images and dropout layers will reduce the network complexity by randomly dropping some neurons. Regularization techniques will penalize the network for producing large errors and, at the end, early stopping methods will stop the training of the network once it stops improving.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Image Created By Author Using Canva A neural network is a combination of different neurons, layers, weights, and biases. The first neural network was created in 1957 and named&nbsp;perceptron. It is similar to modern-day neural networks but it only had one layer. Since then, neural networks have become widely used for making predictions and business [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6,7],"tags":[],"coauthors":[140],"class_list":["post-4595","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>4 Techniques To Tackle Overfitting In Deep Neural Networks - Comet<\/title>\n<meta name=\"description\" content=\"Overfitting is a condition that occurs when a model performs significantly better for training data than it does for new data. In this blog, we will see some of the techniques that are helpful for tackling overfitting in neural networks.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"4 Techniques To Tackle Overfitting In Deep Neural Networks\" \/>\n<meta property=\"og:description\" content=\"Overfitting is a condition that occurs when a model performs significantly better for training data than it does for new data. In this blog, we will see some of the techniques that are helpful for tackling overfitting in neural networks.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-11T01:47:33+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png\" \/>\n<meta name=\"author\" content=\"Abhay Parashar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Abhay Parashar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"4 Techniques To Tackle Overfitting In Deep Neural Networks - Comet","description":"Overfitting is a condition that occurs when a model performs significantly better for training data than it does for new data. In this blog, we will see some of the techniques that are helpful for tackling overfitting in neural networks.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/","og_locale":"en_US","og_type":"article","og_title":"4 Techniques To Tackle Overfitting In Deep Neural Networks","og_description":"Overfitting is a condition that occurs when a model performs significantly better for training data than it does for new data. In this blog, we will see some of the techniques that are helpful for tackling overfitting in neural networks.","og_url":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-11T01:47:33+00:00","article_modified_time":"2025-04-24T17:16:41+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png","type":"","width":"","height":""}],"author":"Abhay Parashar","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Abhay Parashar","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"4 Techniques To Tackle Overfitting In Deep Neural Networks","datePublished":"2022-11-11T01:47:33+00:00","dateModified":"2025-04-24T17:16:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/"},"wordCount":1534,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/","url":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/","name":"4 Techniques To Tackle Overfitting In Deep Neural Networks - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png","datePublished":"2022-11-11T01:47:33+00:00","dateModified":"2025-04-24T17:16:41+00:00","description":"Overfitting is a condition that occurs when a model performs significantly better for training data than it does for new data. In this blog, we will see some of the techniques that are helpful for tackling overfitting in neural networks.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png","contentUrl":"https:\/\/miro.medium.com\/max\/700\/1*JTZZjEAQin8fNEDxyYK2Jg.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/4-techniques-to-tackle-overfitting-in-deep-neural-networks\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"4 Techniques To Tackle Overfitting In Deep Neural Networks"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4595","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4595"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4595\/revisions"}],"predecessor-version":[{"id":15656,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4595\/revisions\/15656"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4595"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4595"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4595"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4595"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}