{"id":4597,"date":"2022-11-10T17:47:27","date_gmt":"2022-11-11T01:47:27","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4597"},"modified":"2025-04-24T17:16:43","modified_gmt":"2025-04-24T17:16:43","slug":"how-to-train-your-deep-learning-models-faster","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/","title":{"rendered":"How To Train Your Deep Learning Models Faster"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\">Photo by <a class=\"au lc\" href=\"https:\/\/unsplash.com\/@marcojodoin?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Marc-Olivier Jodoin<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/p>\n\n\n\n<div class=\"ir is it iu iv\">\n<p id=\"c018\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">\n<\/p><p class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Deep learning is a subset of machine learning that utilizes neural networks in \u201cdeep\u201d architectures, or multiple layers, to extract information from data. Early neural networks employed relatively simple (or \u201cshallow\u201d) architectures, but today\u2019s deep learning neural networks can be incredibly complex.<\/p>\n<p id=\"4d1e\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Neural networks are designed to work much like human brains and are comprised of individual neurons that resemble the structure of a biological neuron. A neuron receives input from other neurons, performs some algorithmic processing on it, and generates an output (which may or may not be fed into yet another neuron as input). Neural networks are therefore a combination of a series of algorithms that ultimately recognize underlying relationships and<strong class=\"bm ly\">&nbsp;<\/strong>patterns in the data. In short, a neural network is an interconnected arrangement of mathematical functions.<\/p>\n<p id=\"0306\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">In 1957, the first neural network, called a Perceptron, was developed by Frank Rosenblatt. It was similar to modern-day neural networks, except in that it only had one hidden layer, as well as configurable weights and biases. Now, many decades later, multi-layer neural networks are widely used for solving incredibly complex problems.<\/p>\n<p id=\"45f4\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Training a multilayer neural network entails much more than just creating connections between neurons, however. It can be a prohibitively lengthy iterative process, and often requires additional computational power that many do not have access to on their home computers.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*CRj9eRATxCr7Jm1PA_Cb6w.png\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"gl gm lz\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 640w, https:\/\/miro.medium.com\/max\/720\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 720w, https:\/\/miro.medium.com\/max\/750\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 750w, https:\/\/miro.medium.com\/max\/786\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 786w, https:\/\/miro.medium.com\/max\/828\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*CRj9eRATxCr7Jm1PA_Cb6w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Multilayer Neural Network \u2014 Image Created By Author Using <\/picture><a class=\"au lc\" href=\"http:\/\/canva.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Canva.com<\/a><\/div>\n<\/div>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"d55b\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Training a deep learning neural network can take days, or even weeks, or more! However, there are some methods that we can use to train models faster, and we\u2019ll discuss a few of them in this article.<\/p>\n<p id=\"535a\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">It can takes days, or even weeks, to train a DL neural network, but the use of transfer learning, optimizers, early stopping, and GPUs can speed up this process significantly.<\/p>\n<h1 id=\"2da6\" class=\"ma mb iy bm mc md me mf mg mh mi mj mk ke ml kf mm kh mn ki mo kk mp kl mq mr ga\" data-selectable-paragraph=\"\">Selecting the right optimizer<\/h1>\n<p id=\"2d99\" class=\"pw-post-body-paragraph ld le iy bm b lf ms jz lh li mt kc lk ll mu ln lo lp mv lr ls lt mw lv lw lx ir ga\" data-selectable-paragraph=\"\">Optimization algorithms are responsible for reducing the loss, and increasing the evaluation metric (often, accuracy), of your deep learning model by controlling the&nbsp;<strong class=\"bm ly\">learning rate<\/strong>. Training time depends largely on how quickly your model can learn optimal weights for your network, so selecting the right optimization method can reduce training time exponentially.<\/p>\n<p id=\"1c45\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">There are many optimizers to choose from, including Gradient Descent, Momentum, Mini Batch Gradient Descent, Nesterov Accelerated Gradient, AdaGrad, RMSProp, and more. Many of these optimizers are mathematically complex, and even many experts in the field don\u2019t know all the details of all the optimization algorithms. One pragmatic approach is to choose a versatile algorithm and use that for most problems. And for this, the&nbsp;<strong class=\"bm ly\">Adam optimizer<\/strong>&nbsp;is an excellent go-to option. It performs exceedingly well on a wide range of applications and compares favorably to other stochastic optimization methods.<\/p>\n<p id=\"70c1\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">The Adam optimizer&nbsp;<em class=\"mx\">adjusts the learning rate as it performs gradient descent<\/em>&nbsp;to ensure reasonable values throughout the weight optimization process. This flexibility allows Adam to increase the learning rate where appropriate (speeding up the training process and avoiding local minima) and decrease the learning rate when approaching the global minimum.<\/p>\n<pre class=\"ko kp kq kr gx my bs mz na dz nb\"><span id=\"f154\" class=\"ga nc mb iy nb b dm nd ne l nf ng\" data-selectable-paragraph=\"\"># Initializing the Adam optimizer\nimport tensorflow as tf<\/span><span id=\"732e\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\">adam = tf.keras.optimizers.Adam(\n    learning_rate=0.001,\n    beta_1=0.9,\n    beta_2=0.999,\n    epsilon=1e-07,\n    amsgrad=False,\n    name='adam',\n    **kwargs\n)<\/span><\/pre>\n<p id=\"9c25\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm ly\">learning_rate<\/strong>: floating-point value or a&nbsp;<code class=\"fp ni nj nk nb b\"><a class=\"au lc\" href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/optimizers\/schedules\/LearningRateSchedule\" target=\"_blank\" rel=\"noopener ugc nofollow\">tf.keras.optimizers.schedules.LearningRateSchedule<\/a><\/code>scheduled value; defaults to&nbsp;<code class=\"fp ni nj nk nb b\">0.001<\/code>.<br>\n<strong class=\"bm ly\">beta_1<\/strong>: The exponential decay rate for the 1st-moment estimate; defaults to&nbsp;<code class=\"fp ni nj nk nb b\">0.9<\/code>.<br>\n<strong class=\"bm ly\">beta_2<\/strong>: The exponential decay rate for the 2nd-moment estimate; defaults to&nbsp;<code class=\"fp ni nj nk nb b\">0.999<\/code>.<br>\n<strong class=\"bm ly\">name:&nbsp;<\/strong>Optional name for the operations created when applying gradients. Defaults to&nbsp;<code class=\"fp ni nj nk nb b\">Adam<\/code>.<strong class=\"bm ly\"><br>\nepsilon<\/strong>: A small constant for numerical stability; defaults to&nbsp;<code class=\"fp ni nj nk nb b\">1e-7<\/code>.<br>\n<strong class=\"bm ly\">amsgrad<\/strong>: Boolean; whether to apply the AMSGrad variant of this algorithm.<\/p>\n<div><\/div>\n<p id=\"4294\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You can learn about different optimizers in more detail&nbsp;<a class=\"au lc\" href=\"https:\/\/towardsdatascience.com\/how-to-train-neural-network-faster-with-optimizers-d297730b3713\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<pre class=\"ko kp kq kr gx my bs mz na dz nb\"><span id=\"10a5\" class=\"ga nc mb iy nb b dm nd ne l nf ng\" data-selectable-paragraph=\"\"># Configuring the model for training\nimport tensorflow as tf\nimport keras<\/span><span id=\"0f68\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\">accuracy = tf.keras.metrics.Accuracy()\nadam = tf.keras.optimizers.Adam()<\/span><span id=\"1038\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\">model.compile(\n  loss='binary_crossentropy',\n  optimizer= adam,\n  metrics=[accuracy]\n)<\/span><\/pre>\n<h1 id=\"eb99\" class=\"ma mb iy bm mc md me mf mg mh mi mj mk ke ml kf mm kh mn ki mo kk mp kl mq mr ga\" data-selectable-paragraph=\"\">Making Use of Transfer Learning<\/h1>\n<p id=\"75c7\" class=\"pw-post-body-paragraph ld le iy bm b lf ms jz lh li mt kc lk ll mu ln lo lp mv lr ls lt mw lv lw lx ir ga\" data-selectable-paragraph=\"\">The process of transferring the knowledge gained by one model to another model is called transfer learning.<\/p>\n<p id=\"b16a\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">In transfer learning, we make use of a pre-trained model with pre-trained weights and freeze all the layers of that model except the final output layer. Then we replace the original output layer with our custom final layer, which includes the number of output classes we need our new model to identify.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*8Fhx7mWQMU-3P11aEhJD4w.png\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"gl gm lz\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*8Fhx7mWQMU-3P11aEhJD4w.png 640w, https:\/\/miro.medium.com\/max\/720\/1*8Fhx7mWQMU-3P11aEhJD4w.png 720w, https:\/\/miro.medium.com\/max\/750\/1*8Fhx7mWQMU-3P11aEhJD4w.png 750w, https:\/\/miro.medium.com\/max\/786\/1*8Fhx7mWQMU-3P11aEhJD4w.png 786w, https:\/\/miro.medium.com\/max\/828\/1*8Fhx7mWQMU-3P11aEhJD4w.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*8Fhx7mWQMU-3P11aEhJD4w.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*8Fhx7mWQMU-3P11aEhJD4w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<div><\/div>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"9ff4\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">There are many pre-trained models available in the&nbsp;<a class=\"au lc\" href=\"https:\/\/keras.io\/api\/applications\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Keras library<\/a>. These models have been trained extensively for long periods of time on massive labeled datasets. By freezing all but the output layer of one of these models, we are able to utilize the patterns learned in these massive training sessions, and apply them to our own use cases. And to use one is as simple as one line of code, which downloads the pre-trained model from Keras.<\/p>\n<div><\/div>\n<pre class=\"ko kp kq kr gx my bs mz na dz nb\"><span id=\"f4f5\" class=\"ga nc mb iy nb b dm nd ne l nf ng\" data-selectable-paragraph=\"\">from keras.layers import Dense, Flatten\nfrom keras.models import Model\nfrom keras.models import Sequential\nfrom keras.applications.vgg16 import VGG16<\/span><span id=\"f8e0\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\"><strong class=\"nb iz\"># Image Size<\/strong>\nIMAGE_SIZE = [224, 224]<\/span><span id=\"000f\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\"><strong class=\"nb iz\"># Pretrained Model\n<\/strong>vgg = VGG16(input_shape= IMAGE_SIZE + [3], weights ='imagenet',\n                include_top= False)<\/span><span id=\"7b91\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\"><strong class=\"nb iz\"># Freeze layers\n<\/strong>for layer in vgg.layers:\n  layer.trainable = False<\/span><span id=\"51ee\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\"><strong class=\"nb iz\"># Add custom output layer\n<\/strong>x = Flatten()(vgg.output)\nprediction = Dense(<strong class=\"nb iz\">3<\/strong>, activation= 'softmax')(x)\nmodel_1 = Model(inputs= vgg.input, outputs=prediction)<\/span><span id=\"9954\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\"><strong class=\"nb iz\"># Generate Model Summary\nmodel_1<\/strong>.summary()<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/504\/0*gqxdFoX4j7YnHhBQ.png\" alt=\"\" width=\"504\" height=\"747\"><\/figure><div class=\"gl gm nl\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/0*gqxdFoX4j7YnHhBQ.png 640w, https:\/\/miro.medium.com\/max\/720\/0*gqxdFoX4j7YnHhBQ.png 720w, https:\/\/miro.medium.com\/max\/750\/0*gqxdFoX4j7YnHhBQ.png 750w, https:\/\/miro.medium.com\/max\/786\/0*gqxdFoX4j7YnHhBQ.png 786w, https:\/\/miro.medium.com\/max\/828\/0*gqxdFoX4j7YnHhBQ.png 828w, https:\/\/miro.medium.com\/max\/1100\/0*gqxdFoX4j7YnHhBQ.png 1100w, https:\/\/miro.medium.com\/max\/1008\/0*gqxdFoX4j7YnHhBQ.png 1008w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 504px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"74ad\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">You can see that we have attached our new output layer classifying only three classes instead of 1,000.<\/p>\n<p id=\"b801\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Transfer learning can save a lot of time and resources because we don\u2019t have to train the whole model from scratch.<\/p>\n<\/div>\n\n\n\n<div class=\"o dx nm nn id no\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<blockquote class=\"nt\"><p id=\"f1ff\" class=\"nu nv iy bm nw nx ny nz oa ob oc lx cn\" data-selectable-paragraph=\"\">Big teams rely on big ideas.&nbsp;<a class=\"au lc\" href=\"https:\/\/info.comet.ml\/roundtable-developing-ml-at-enterprise-scale\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Learn how experts at Uber, WorkFusion, and The RealReal use Comet<\/a>&nbsp;to scale out their ML models and ensure visibility and collaboration company-wide.<\/p><\/blockquote>\n<\/div>\n\n\n\n<div class=\"o dx nm nn id no\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"88fe\" class=\"ma mb iy bm mc md od mf mg mh oe mj mk ke of kf mm kh og ki mo kk oh kl mq mr ga\" data-selectable-paragraph=\"\">Early Stopping<\/h1>\n<p id=\"f0e5\" class=\"pw-post-body-paragraph ld le iy bm b lf ms jz lh li mt kc lk ll mu ln lo lp mv lr ls lt mw lv lw lx ir ga\" data-selectable-paragraph=\"\">One epoch is one complete pass of training data through the neural network. During each epoch, each neuron has the opportunity to update its weights, so the more epochs you choose, the longer your training will be. Additionally, choosing too many epochs can lead to overfitting. On the other hand, choosing too few epochs can cause underfitting.<\/p>\n<p id=\"7f8c\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Early stopping is a form of regularization that stops the training process once model performance stops improving on the validation set. It allows significantly decreases the likelihood of overfitting the model.<\/p>\n<p id=\"912e\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Keras has a callback function designed to stop training early, once it has detected that the model is no longer making significant improvements<\/p>\n<pre class=\"ko kp kq kr gx my bs mz na dz nb\"><span id=\"6774\" class=\"ga nc mb iy nb b dm nd ne l nf ng\" data-selectable-paragraph=\"\">from tensorflow.keras.callbacks import EarlyStopping<\/span><span id=\"447d\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\">early_stopping = EarlyStopping(\n    monitor='val_loss',\n    min_delta=0,\n    patience=0,\n    verbose=0,\n    mode='auto',\n    baseline=None,\n    restore_best_weights=False\n)<\/span><\/pre>\n<p id=\"2366\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">We won\u2019t be utilizing all of these parameters for our example, but let\u2019s take a look at two we will use:<\/p>\n<p id=\"ba15\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm ly\">monitor:&nbsp;<\/strong>metric to be used as a measure for terminating the training.<br>\n<strong class=\"bm ly\">patience<\/strong>: number of epochs with no improvement after which training gets terminated.<\/p>\n<p id=\"19ee\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">In the example below, we have set out model to stop training when the loss stops improving for two consecutive epochs.<\/p>\n<pre class=\"ko kp kq kr gx my bs mz na dz nb\"><span id=\"af5a\" class=\"ga nc mb iy nb b dm nd ne l nf ng\" data-selectable-paragraph=\"\">from tensorflow.keras.callbacks import EarlyStopping<\/span><span id=\"07dd\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\">early_stopping = EarlyStopping(monitor='loss', patience=2)<\/span><span id=\"3378\" class=\"ga nc mb iy nb b dm nh ne l nf ng\" data-selectable-paragraph=\"\">history = model.fit(\n    X_train,\n    y_train,\n    epochs= 100,\n    validation_split= 0.20,\n    batch_size= 50,\n    verbose= \"auto\",\n    callbacks= [early_stopping]\n)<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*5_faai1PzWkuCCRfwTw4tA.png\" alt=\"\" width=\"700\" height=\"134\"><\/figure><div class=\"gl gm oi\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*5_faai1PzWkuCCRfwTw4tA.png 640w, https:\/\/miro.medium.com\/max\/720\/1*5_faai1PzWkuCCRfwTw4tA.png 720w, https:\/\/miro.medium.com\/max\/750\/1*5_faai1PzWkuCCRfwTw4tA.png 750w, https:\/\/miro.medium.com\/max\/786\/1*5_faai1PzWkuCCRfwTw4tA.png 786w, https:\/\/miro.medium.com\/max\/828\/1*5_faai1PzWkuCCRfwTw4tA.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*5_faai1PzWkuCCRfwTw4tA.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*5_faai1PzWkuCCRfwTw4tA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"95db\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Early stopping will stop the neural network when it stops improving for the specified number of epochs, thus reducing the training time taken by the network.<\/p>\n<h1 id=\"47fb\" class=\"ma mb iy bm mc md me mf mg mh mi mj mk ke ml kf mm kh mn ki mo kk mp kl mq mr ga\" data-selectable-paragraph=\"\">GPUs For Training<\/h1>\n<p id=\"a8b5\" class=\"pw-post-body-paragraph ld le iy bm b lf ms jz lh li mt kc lk ll mu ln lo lp mv lr ls lt mw lv lw lx ir ga\" data-selectable-paragraph=\"\">Ultimately, no matter how much you optimize your deep learning model, if you are training it on a CPU it will take you exponentially longer that training it on a GPU. GPUs can improve the overall training process by performing multiple computations simultaneously.<\/p>\n<p id=\"24fa\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">There are two options through which you can access GPU power for your model. First, the cheapest option is to make use of cloud GPUs provided by big tech companies.<\/p>\n<p id=\"9edf\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\"><a class=\"au lc\" href=\"https:\/\/research.google.com\/colaboratory\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bm ly\">Google Colab<\/strong><\/a>&nbsp;is a leading GPU provider, that provides you the opportunity to easily upload your Python notebook and train a model on their virtual machines. Once all files are uploaded, you can even walk away from your computer and track the training process on your mobile phone or tablet.<\/p>\n<p id=\"b8ac\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\"><strong class=\"bm ly\">Kaggle<\/strong>&nbsp;is another GPU provider that provides 30 hours per week of free GPU time.<\/p>\n<p id=\"d69b\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">If you\u2019re willing to spend some big bucks on training your model, there are other platforms as well that will provide you will even more access hours to their cloud GPUs. One example is Good Cloud GPU ($300 for 850 hours).<\/p>\n<p id=\"13a2\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Another expensive option is to build a high-end computer system that has at least 16GB of ram and around 8 GB of high-end graphics like Geforce RTX from Nvidia. A system like this will take a serious bite out of your pocket, though. If you are ready to invest some money, the CUDA toolkit, is definitely worth considering.<\/p>\n<p id=\"27d6\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">CUDA will create a path between your computer hardware and deep learning model. The NVIDIA CUDA Toolkit provides a development environment for creating high-performance GPU accelerated applications like computer games, deep learning models, etc. The NVIDIA CUDA Deep Neural Network library (cuDNN) is specially designed for training and building deep neural networks. You can read about the complete step-by-step installation process&nbsp;<a class=\"au lc\" href=\"https:\/\/medium.com\/pythoneers\/cuda-installation-in-windows-2020-638b008b4639\" rel=\"noopener\"><em class=\"mx\">here<\/em><\/a>.<\/p>\n<p id=\"4f3d\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">GPUs can drastically increase the training speed of your deep learning models, but access to GPUs is limited, unless you have deep pockets.<\/p>\n<h2 id=\"8757\" class=\"nc mb iy bm mc oj ok ol mg om on oo mk ll op oq mm lp or os mo lt ot ou mq ov ga\" data-selectable-paragraph=\"\">Conclusion<\/h2>\n<p id=\"2220\" class=\"pw-post-body-paragraph ld le iy bm b lf ms jz lh li mt kc lk ll mu ln lo lp mv lr ls lt mw lv lw lx ir ga\" data-selectable-paragraph=\"\">Although training a neural network takes lots of time, with the help of optimizers, transfer learning, and access to advanced hardware, we can reduce the training time significantly, while still creating a top-notch model.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Marc-Olivier Jodoin&nbsp;on&nbsp;Unsplash Deep learning is a subset of machine learning that utilizes neural networks in \u201cdeep\u201d architectures, or multiple layers, to extract information from data. Early neural networks employed relatively simple (or \u201cshallow\u201d) architectures, but today\u2019s deep learning neural networks can be incredibly complex. Neural networks are designed to work much like human [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6,7],"tags":[],"coauthors":[140],"class_list":["post-4597","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How To Train Your Deep Learning Models Faster - Comet<\/title>\n<meta name=\"description\" content=\"Training a multilayer neural network entails much more than just creating connections between neurons, however. It can be a prohibitively lengthy iterative process, and often requires additional computational power that many do not have access to on their home computers.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How To Train Your Deep Learning Models Faster\" \/>\n<meta property=\"og:description\" content=\"Training a multilayer neural network entails much more than just creating connections between neurons, however. It can be a prohibitively lengthy iterative process, and often requires additional computational power that many do not have access to on their home computers.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-11T01:47:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO\" \/>\n<meta name=\"author\" content=\"Abhay Parashar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Abhay Parashar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How To Train Your Deep Learning Models Faster - Comet","description":"Training a multilayer neural network entails much more than just creating connections between neurons, however. It can be a prohibitively lengthy iterative process, and often requires additional computational power that many do not have access to on their home computers.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/","og_locale":"en_US","og_type":"article","og_title":"How To Train Your Deep Learning Models Faster","og_description":"Training a multilayer neural network entails much more than just creating connections between neurons, however. It can be a prohibitively lengthy iterative process, and often requires additional computational power that many do not have access to on their home computers.","og_url":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-11T01:47:27+00:00","article_modified_time":"2025-04-24T17:16:43+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO","type":"","width":"","height":""}],"author":"Abhay Parashar","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Abhay Parashar","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"How To Train Your Deep Learning Models Faster","datePublished":"2022-11-11T01:47:27+00:00","dateModified":"2025-04-24T17:16:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/"},"wordCount":1391,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/","url":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/","name":"How To Train Your Deep Learning Models Faster - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO","datePublished":"2022-11-11T01:47:27+00:00","dateModified":"2025-04-24T17:16:43+00:00","description":"Training a multilayer neural network entails much more than just creating connections between neurons, however. It can be a prohibitively lengthy iterative process, and often requires additional computational power that many do not have access to on their home computers.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO","contentUrl":"https:\/\/miro.medium.com\/max\/700\/0*RPTgooN5_raho8DO"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-train-your-deep-learning-models-faster\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"How To Train Your Deep Learning Models Faster"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4597","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4597"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4597\/revisions"}],"predecessor-version":[{"id":15657,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4597\/revisions\/15657"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4597"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4597"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4597"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4597"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}