{"id":4600,"date":"2022-11-10T17:46:24","date_gmt":"2022-11-11T01:46:24","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4600"},"modified":"2025-04-24T17:16:44","modified_gmt":"2025-04-24T17:16:44","slug":"dropout-regularization-with-tensorflow-keras","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/","title":{"rendered":"Dropout Regularization With Tensorflow Keras"},"content":{"rendered":"\n<div class=\"ir is it iu iv\">\n<figure class=\"kn ko kp kq gx kr gl gm paragraph-image\">\n<div class=\"ks kt do ku ce kv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kw kx c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png\" alt=\"\" width=\"700\" height=\"374\"><\/figure><div class=\"gl gm gn\" style=\"text-align: center;\"><picture><\/picture><strong class=\"bm lb\">Image By Author<\/strong><\/div>\n<\/div>\n<\/figure>\n<p id=\"65f5\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances. One solution to combat this occurrence is to apply regularization.<\/p>\n<p id=\"adfd\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">The technique we are going to be focusing on here is called&nbsp;<em class=\"lx\">Dropout<\/em>. We will use different methods to implement it in Tensorflow Keras and evaluate how it improves our model.<\/p>\n<p id=\"cbd0\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\"><em class=\"lx\">\u201c<\/em><strong class=\"bm lb\"><em class=\"lx\">Dilution<\/em><\/strong><em class=\"lx\">&nbsp;(also called&nbsp;<\/em><strong class=\"bm lb\"><em class=\"lx\">Dropout<\/em><\/strong><em class=\"lx\">&nbsp;or&nbsp;<\/em><strong class=\"bm lb\"><em class=\"lx\">DropConnect<\/em><\/strong><em class=\"lx\">) is a regularization technique for reducing overfitting in artificial neural networks by preventing complex co-adaptations on training data. It is an efficient way of performing model averaging with neural networks.\u201d<\/em><br>\n\u2014&nbsp;<a class=\"au ly\" href=\"https:\/\/en.wikipedia.org\/wiki\/Dilution_(neural_networks)\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm lb\">Wikipedia<\/strong><\/a><\/p>\n<\/div>\n\n\n\n<div class=\"o dx lz ma id mb\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"95de\" class=\"mg mh iy bm mi mj mk ml mm mn mo mp mq ke mr kf ms kh mt ki mu kk mv kl mw mx ga\" data-selectable-paragraph=\"\">The Mechanics of Dropout<\/h1>\n<p id=\"7b1c\" class=\"pw-post-body-paragraph lc ld iy bm b le my jz lg lh mz kc lj lk na lm ln lo nb lq lr ls nc lu lv lw ir ga\" data-selectable-paragraph=\"\">Dropout is a computationally cheap and effective technique used to reduce overfitting in neural networks. The technique works by randomly dropping out selected neurons during the training phase. Neurons in later layers do not reap the contribution of dropped-out neurons during forward propagation, nor will updates be made to the dropped-out neurons during backpropagation.<\/p>\n<p id=\"ec35\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">\u201c<em class=\"lx\">By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections\u201d<br>\n\u2014&nbsp;<\/em><strong class=\"bm lb\">Srivastava, et al.<\/strong>&nbsp;<strong class=\"bm lb\">2014<\/strong>.&nbsp;<a class=\"au ly\" href=\"http:\/\/jmlr.org\/papers\/v15\/srivastava14a.html\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm lb\">Dropout: A Simple Way to Prevent Neural Networks from Overfitting<\/strong><\/a><\/p>\n<p id=\"976f\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">You can think of the dropout procedure as an ensemble method that trains several varying neural network architectures in parallel. The effect of dropping out neurons at random is that other neurons must intervene to make predictions for the absent neurons \u2014 this results in slightly different models being seen during the forward pass which makes the network less sensitive to specific weights of neurons.<\/p>\n<\/div>\n\n\n\n<div class=\"o dx lz ma id mb\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"bd93\" class=\"mg mh iy bm mi mj mk ml mm mn mo mp mq ke mr kf ms kh mt ki mu kk mv kl mw mx ga\" data-selectable-paragraph=\"\">Applying Dropout with Tensorflow Keras<\/h1>\n<p id=\"8fde\" class=\"pw-post-body-paragraph lc ld iy bm b le my jz lg lh mz kc lj lk na lm ln lo nb lq lr ls nc lu lv lw ir ga\" data-selectable-paragraph=\"\">Dropout is used during the training phase of model building \u2014 no values are dropped during inference. We simply provide a rate that sets the frequency of which input units are randomly set to 0 (dropped out). Next, we will explore various ways to use dropout with Tensorflow Keras.<\/p>\n<p id=\"e17a\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">I\u2019ll be using the&nbsp;<code class=\"fp nd ne nf ng b\">make_classification<\/code>&nbsp;class from&nbsp;<a class=\"au ly\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">scikit-learn<\/a>&nbsp;to generate a random binary classification dataset. The training dataset I\u2019ll create will have 1000 instances and 20 features with 13 of them being informative. The testing dataset will contain 200 instances and 20 features with 13 of them being informative.<\/p>\n<p id=\"51fa\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">To define a baseline, we will build a three layer neural network (two hidden layers and an output layer): The first hidden layer will have 10 neurons, the second will have 10 neurons, and the output will have 1 neuron to classify the classes.&nbsp;<a class=\"au ly\" href=\"https:\/\/medium.com\/fritzheartbeat\/7-optimization-methods-used-in-deep-learning-dd0a57fe6b1\" rel=\"noopener\">Adam optimization<\/a>&nbsp;will be used to optimize the model with a learning rate of&nbsp;<code class=\"fp nd ne nf ng b\">0.001<\/code>.<\/p>\n<p id=\"ec98\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">See the code to create the baseline below:<\/p>\n<pre>import numpy as np\nimport pandas as pd\nimport tensorflow as tf\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import make_classification\nfrom mlxtend.plotting import plot_decision_regions\n\n# create training data\nX_train, y_train = make_classification(\n    n_samples=1000,\n    n_informative=13,\n    random_state=2022\n)\n\n# create testing data\nX_test, y_test = make_classification(\n    n_samples=200,\n    n_informative=13,\n    random_state=2022\n)\n\n# build ANN\nmodel = tf.keras.Sequential([\n                             tf.keras.layers.Dense(10, activation=\"sigmoid\",\n                                                   input_shape=(X_train.shape[1], ),\n                                                   kernel_initializer=\"glorot_normal\"),\n                             tf.keras.layers.Dense(10, activation=\"sigmoid\"),\n                             tf.keras.layers.Dense(1, activation=\"sigmoid\")\n])\n\nmodel.compile(optimizer=\"adam\",\n               loss=\"binary_crossentropy\",\n               metrics=[\"accuracy\"])\n\nhistory = model.fit(X_train,\n                    y_train,\n                    epochs=100,\n                    validation_data=[X_test, y_test],\n                    verbose=0)\n\n\n# plot the model\nfig, ax = plt.subplots(figsize=(12, 6))\nplt.plot(history.history[\"loss\"], label=\"loss\")\nplt.plot(history.history[\"val_loss\"], label=\"val_loss\")\nplt.title(\"Learning Rate = 0.001\")\nplt.xlabel(\"Epochs\")\nplt.ylabel(\"Cost\")\nplt.legend()\nplt.show()<\/pre>\n<p id=\"ef8b\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">The output of this code is:<\/p>\n<figure class=\"kn ko kp kq gx kr gl gm paragraph-image\">\n<div class=\"ks kt do ku ce kv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kw kx c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*A2MtT87mOv1e-NZIFp8b4Q.png\" alt=\"\" width=\"700\" height=\"375\"><\/figure><div class=\"gl gm nj\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*A2MtT87mOv1e-NZIFp8b4Q.png 640w, https:\/\/miro.medium.com\/max\/720\/1*A2MtT87mOv1e-NZIFp8b4Q.png 720w, https:\/\/miro.medium.com\/max\/750\/1*A2MtT87mOv1e-NZIFp8b4Q.png 750w, https:\/\/miro.medium.com\/max\/786\/1*A2MtT87mOv1e-NZIFp8b4Q.png 786w, https:\/\/miro.medium.com\/max\/828\/1*A2MtT87mOv1e-NZIFp8b4Q.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*A2MtT87mOv1e-NZIFp8b4Q.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*A2MtT87mOv1e-NZIFp8b4Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p>&nbsp;<\/p>\n<h2 id=\"9284\" class=\"nk mh iy bm mi nl nm nn mm no np nq mq lk nr ns ms lo nt nu mu ls nv nw mw nx ga\" data-selectable-paragraph=\"\">Applying Dropout to the Input Layer<\/h2>\n<p id=\"c5ec\" class=\"pw-post-body-paragraph lc ld iy bm b le my jz lg lh mz kc lj lk na lm ln lo nb lq lr ls nc lu lv lw ir ga\" data-selectable-paragraph=\"\">Srivastava et al., recommend dropout with a 20% rate to the input layer. We will implement this in the example below which means five inputs will be randomly dropped during each update cycle \u2014 formula 1 \/ (1-<code class=\"fp nd ne nf ng b\">rate<\/code>).<\/p>\n<p id=\"bf60\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">\u201c<em class=\"lx\">[\u2026] we can use max-norm regularization. This constrains the norm of the vector of incoming weights at each hidden unit to be bound by a constant c. Typical values of c range from 3 to 4.\u201d<br>\n\u2014&nbsp;<\/em><strong class=\"bm lb\">Srivastava, et al.<\/strong>&nbsp;<strong class=\"bm lb\">2014<\/strong>.&nbsp;<a class=\"au ly\" href=\"http:\/\/jmlr.org\/papers\/v15\/srivastava14a.html\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm lb\">Dropout: A Simple Way to Prevent Neural Networks from Overfitting<\/strong><\/a><\/p>\n<\/div>\n\n\n\n<div class=\"o dx lz ma id mb\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<blockquote class=\"ny\"><p id=\"f2d2\" class=\"nz oa iy bm ob oc od oe of og oh lw cn\" data-selectable-paragraph=\"\"><a class=\"au ly\" href=\"https:\/\/www.deeplearningweekly.com\/about\" target=\"_blank\" rel=\"noopener ugc nofollow\">Join 16,000 of your colleagues at Deep Learning Weekly<\/a>&nbsp;for the latest products, acquisitions, technologies, deep-dives and more.<\/p><\/blockquote>\n<\/div>\n\n\n\n<div class=\"o dx lz ma id mb\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<p id=\"11ed\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">It\u2019s also recommended to impose a constraint on the weights for each hidden layer by ensuring the maximum norm of the weights does not exceed three. We do this by setting the value in&nbsp;<code class=\"fp nd ne nf ng b\">kernel_constraint<\/code>.<\/p>\n<p id=\"2e24\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">Here\u2019s how we build on our last model:<\/p>\n<pre>model = tf.keras.Sequential(\n    [\n     tf.keras.layers.Dropout(0.2, input_shape=(X_train.shape[1], )),\n     tf.keras.layers.Dense(10,\n                           activation=\"sigmoid\",\n                           kernel_initializer=\"glorot_normal\",\n                           kernel_constraint=tf.keras.constraints.MaxNorm(3)),\n     tf.keras.layers.Dense(10,\n                           activation=\"sigmoid\",\n                           kernel_constraint=tf.keras.constraints.MaxNorm(3)),\n     tf.keras.layers.Dense(1, activation=\"sigmoid\")\n])\n\nmodel.compile(optimizer=\"adam\",\n               loss=\"binary_crossentropy\",\n               metrics=[\"accuracy\"])\n\nhistory = model.fit(X_train,\n                    y_train,\n                    epochs=100,\n                    validation_data=[X_test, y_test],\n                    verbose=0)\n\n\n# plot the model\nfig, ax = plt.subplots(figsize=(12, 6))\nplt.plot(history.history[\"loss\"], label=\"loss\")\nplt.plot(history.history[\"val_loss\"], label=\"val_loss\")\nplt.title(\"Learning Rate = 0.001\")\nplt.xlabel(\"Epochs\")\nplt.ylabel(\"Cost\")\nplt.legend()\nplt.show()<\/pre>\n<p id=\"b725\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">The output from the code above is as follows:<\/p>\n<figure class=\"kn ko kp kq gx kr gl gm paragraph-image\">\n<div class=\"ks kt do ku ce kv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kw kx c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*s0VyVH2AXcAW94Zmp2MjoA.png\" alt=\"\" width=\"700\" height=\"377\"><\/figure><div class=\"gl gm oi\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*s0VyVH2AXcAW94Zmp2MjoA.png 640w, https:\/\/miro.medium.com\/max\/720\/1*s0VyVH2AXcAW94Zmp2MjoA.png 720w, https:\/\/miro.medium.com\/max\/750\/1*s0VyVH2AXcAW94Zmp2MjoA.png 750w, https:\/\/miro.medium.com\/max\/786\/1*s0VyVH2AXcAW94Zmp2MjoA.png 786w, https:\/\/miro.medium.com\/max\/828\/1*s0VyVH2AXcAW94Zmp2MjoA.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*s0VyVH2AXcAW94Zmp2MjoA.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*s0VyVH2AXcAW94Zmp2MjoA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"7120\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">The plot above shows we\u2019ve managed to reduce overfitting (notice the reduction of values on the Y-axis), but our model is still overfitting. Now, let\u2019s apply dropout to the hidden layers.<\/p>\n<h2 id=\"8268\" class=\"nk mh iy bm mi nl nm nn mm no np nq mq lk nr ns ms lo nt nu mu ls nv nw mw nx ga\" data-selectable-paragraph=\"\">Applying Dropout to the Hidden Layers<\/h2>\n<p id=\"d4a1\" class=\"pw-post-body-paragraph lc ld iy bm b le my jz lg lh mz kc lj lk na lm ln lo nb lq lr ls nc lu lv lw ir ga\" data-selectable-paragraph=\"\">We may also decide to apply dropout to our hidden layers. Before we apply it to both input and hidden layers, we will take a look at the effects of applying it to the hidden layers.<\/p>\n<p id=\"0554\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">\u201c<em class=\"lx\">In the simplest case, each unit is retained with a fixed probability p independent of other units, where p can be chosen using a validation set or can simply be set at 0.5, which seems to be close to optimal for a wide range of networks and tasks.\u201d<br>\n\u2014&nbsp;<\/em><strong class=\"bm lb\">Srivastava, et al.<\/strong>&nbsp;<strong class=\"bm lb\">2014<\/strong>.&nbsp;<a class=\"au ly\" href=\"http:\/\/jmlr.org\/papers\/v15\/srivastava14a.html\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm lb\">Dropout: A Simple Way to Prevent Neural Networks from Overfitting<\/strong><\/a><\/p>\n<p id=\"1871\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">The example below extends on our baseline model by adding dropout layers between layers 1\u20132 and 2\u20133. The dropout rate used is 0.5 with the same kernel constraint as seen in the example above.<\/p>\n<pre># build ANN\nmodel = tf.keras.Sequential(\n    [\n     tf.keras.layers.Dense(10,\n                           activation=\"sigmoid\",\n                           input_shape=(X_train.shape[1], ),\n                           kernel_initializer=\"glorot_normal\",\n                           kernel_constraint=tf.keras.constraints.MaxNorm(3)),\n     tf.keras.layers.Dropout(0.5),\n     tf.keras.layers.Dense(10,\n                           activation=\"sigmoid\",\n                           kernel_constraint=tf.keras.constraints.MaxNorm(3)),\n     tf.keras.layers.Dropout(0.5),\n     tf.keras.layers.Dense(1, activation=\"sigmoid\")\n])\n\nmodel.compile(optimizer=\"adam\",\n               loss=\"binary_crossentropy\",\n               metrics=[\"accuracy\"])\n\nhistory = model.fit(X_train,\n                    y_train,\n                    epochs=100,\n                    validation_data=[X_test, y_test],\n                    verbose=0)\n\n\n# plot the model\nfig, ax = plt.subplots(figsize=(12, 6))\nplt.plot(history.history[\"loss\"], label=\"loss\")\nplt.plot(history.history[\"val_loss\"], label=\"val_loss\")\nplt.title(\"Learning Rate = 0.001\")\nplt.xlabel(\"Epochs\")\nplt.ylabel(\"Cost\")\nplt.legend()\nplt.show()<\/pre>\n<p id=\"a466\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">The output of this code is as follows:<\/p>\n<figure class=\"kn ko kp kq gx kr gl gm paragraph-image\">\n<div class=\"ks kt do ku ce kv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kw kx c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*2wQhtmxcvVhkRMMjdI8G4w.png\" alt=\"\" width=\"700\" height=\"373\"><\/figure><div class=\"gl gm oj\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*2wQhtmxcvVhkRMMjdI8G4w.png 640w, https:\/\/miro.medium.com\/max\/720\/1*2wQhtmxcvVhkRMMjdI8G4w.png 720w, https:\/\/miro.medium.com\/max\/750\/1*2wQhtmxcvVhkRMMjdI8G4w.png 750w, https:\/\/miro.medium.com\/max\/786\/1*2wQhtmxcvVhkRMMjdI8G4w.png 786w, https:\/\/miro.medium.com\/max\/828\/1*2wQhtmxcvVhkRMMjdI8G4w.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*2wQhtmxcvVhkRMMjdI8G4w.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*2wQhtmxcvVhkRMMjdI8G4w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"d35c\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">Once again, we were able to reduce overfitting in comparison to the baseline, but we are still overfitting the train data.<\/p>\n<h2 id=\"f363\" class=\"nk mh iy bm mi nl nm nn mm no np nq mq lk nr ns ms lo nt nu mu ls nv nw mw nx ga\" data-selectable-paragraph=\"\">Applying Dropout to Input and Hidden Layers<\/h2>\n<p id=\"25fa\" class=\"pw-post-body-paragraph lc ld iy bm b le my jz lg lh mz kc lj lk na lm ln lo nb lq lr ls nc lu lv lw ir ga\" data-selectable-paragraph=\"\">In this section, we are going to apply dropout to both the input and dropout layers as follows:<\/p>\n<pre># build ANN\nmodel = tf.keras.Sequential(\n    [\n     tf.keras.layers.Dropout(0.2,\n                             input_shape=(X_train.shape[1], )),\n     tf.keras.layers.Dense(10,\n                           activation=\"sigmoid\",\n                           kernel_initializer=\"glorot_normal\",\n                           kernel_constraint=tf.keras.constraints.MaxNorm(3)),\n     tf.keras.layers.Dropout(0.5),\n     tf.keras.layers.Dense(10,\n                           activation=\"sigmoid\",\n                           kernel_constraint=tf.keras.constraints.MaxNorm(3)),\n     tf.keras.layers.Dropout(0.5),\n     tf.keras.layers.Dense(1, activation=\"sigmoid\")\n])\n\nmodel.compile(optimizer=\"adam\",\n               loss=\"binary_crossentropy\",\n               metrics=[\"accuracy\"])\n\nhistory = model.fit(X_train,\n                    y_train,\n                    epochs=100,\n                    validation_data=[X_test, y_test],\n                    verbose=0)\n\n\n# plot the model\nfig, ax = plt.subplots(figsize=(12, 6))\nplt.plot(history.history[\"loss\"], label=\"loss\")\nplt.plot(history.history[\"val_loss\"], label=\"val_loss\")\nplt.title(\"Learning Rate = 0.001\")\nplt.xlabel(\"Epochs\")\nplt.ylabel(\"Cost\")\nplt.legend()\nplt.show()<\/pre>\n<p id=\"88d4\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">Which outputs:<\/p>\n<figure class=\"kn ko kp kq gx kr gl gm paragraph-image\">\n<div class=\"ks kt do ku ce kv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kw kx c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*l6khqFlDmaovhojcvNvmNQ.png\" alt=\"\" width=\"700\" height=\"369\"><\/figure><div class=\"gl gm nj\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*l6khqFlDmaovhojcvNvmNQ.png 640w, https:\/\/miro.medium.com\/max\/720\/1*l6khqFlDmaovhojcvNvmNQ.png 720w, https:\/\/miro.medium.com\/max\/750\/1*l6khqFlDmaovhojcvNvmNQ.png 750w, https:\/\/miro.medium.com\/max\/786\/1*l6khqFlDmaovhojcvNvmNQ.png 786w, https:\/\/miro.medium.com\/max\/828\/1*l6khqFlDmaovhojcvNvmNQ.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*l6khqFlDmaovhojcvNvmNQ.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*l6khqFlDmaovhojcvNvmNQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"f83a\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">Overfitting has been reduced, but our model is still not performing as well as we would like. As a task for the reader, try to improve the loss of our model so it performs better on the validation data while reducing the amount of overfitting \u2014 the code can be found on&nbsp;<a class=\"au ly\" href=\"https:\/\/github.com\/kurtispykes\/deep-learning-examples\/blob\/main\/Dropout.ipynb\" target=\"_blank\" rel=\"noopener ugc nofollow\">GitHub<\/a>.<\/p>\n<p id=\"9f01\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\">Dropout is a powerful, yet computationally cheap regularization technique. In this article, you discovered the mechanics behind dropout, how to implement it on your input layers, and how to implement it on your hidden layers.<\/p>\n<p id=\"fdd1\" class=\"pw-post-body-paragraph lc ld iy bm b le lf jz lg lh li kc lj lk ll lm ln lo lp lq lr ls lt lu lv lw ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm lb\">Recommended reads:<\/strong><br>\n\u2192&nbsp;<a class=\"au ly\" href=\"https:\/\/towardsdatascience.com\/deep-learning-tips-and-tricks-1ef708ec5f53\" target=\"_blank\" rel=\"noopener\">Deep Learning Tips &amp; Tricks<\/a><br>\n\u2192&nbsp;<a class=\"au ly\" href=\"https:\/\/machinelearningmastery.com\/dropout-for-regularizing-deep-neural-networks\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">A Gentle Introduction to Dropout for Regularizing Deep Neural Networks<\/a><br>\n\u2192&nbsp;<a class=\"au ly\" href=\"https:\/\/arxiv.org\/abs\/1207.0580\" target=\"_blank\" rel=\"noopener ugc nofollow\">Improving Neural Networks by Preventing Co-adaptation of Feature Detectors<\/a><br>\n\u2192&nbsp;<a class=\"au ly\" href=\"http:\/\/jmlr.org\/papers\/v15\/srivastava14a.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">Dropout: A Simple Way to Prevent Neural Networks from Overfitting<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Image By Author Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances. One solution to combat this occurrence is to apply regularization. The technique we are going to be [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[138],"class_list":["post-4600","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Dropout Regularization With Tensorflow Keras - Comet<\/title>\n<meta name=\"description\" content=\"Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Dropout Regularization With Tensorflow Keras\" \/>\n<meta property=\"og:description\" content=\"Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-11T01:46:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png\" \/>\n<meta name=\"author\" content=\"Kurtis Pykes\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kurtis Pykes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Dropout Regularization With Tensorflow Keras - Comet","description":"Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/","og_locale":"en_US","og_type":"article","og_title":"Dropout Regularization With Tensorflow Keras","og_description":"Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances.","og_url":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-11T01:46:24+00:00","article_modified_time":"2025-04-24T17:16:44+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png","type":"","width":"","height":""}],"author":"Kurtis Pykes","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Kurtis Pykes","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"Dropout Regularization With Tensorflow Keras","datePublished":"2022-11-11T01:46:24+00:00","dateModified":"2025-04-24T17:16:44+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/"},"wordCount":972,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/","url":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/","name":"Dropout Regularization With Tensorflow Keras - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png","datePublished":"2022-11-11T01:46:24+00:00","dateModified":"2025-04-24T17:16:44+00:00","description":"Deep neural networks are complex models which makes them much more prone to overfitting \u2014 especially when the dataset has few examples. Left unhandled, an overfit model would fail to generalize well to unseen instances.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png","contentUrl":"https:\/\/miro.medium.com\/max\/700\/1*EZqf_OOeEMj85WOhH5FwEA.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/dropout-regularization-with-tensorflow-keras\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Dropout Regularization With Tensorflow Keras"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4600"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4600\/revisions"}],"predecessor-version":[{"id":15658,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4600\/revisions\/15658"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4600"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}