{"id":4560,"date":"2022-11-10T17:53:10","date_gmt":"2022-11-11T01:53:10","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4560"},"modified":"2025-04-24T17:16:27","modified_gmt":"2025-04-24T17:16:27","slug":"weight-initialization-in-deep-neural-networks","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/","title":{"rendered":"Weight Initialization In Deep Neural Networks"},"content":{"rendered":"\n<div class=\"ir is it iu iv\">\n<figure><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09\"><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Photo by&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/@graphicnode?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Graphic Node<\/a>&nbsp;on&nbsp;<a class=\"au lc\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/p>\n<p id=\"a887\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Very deep neural networks can suffer from either&nbsp;<a class=\"au lc\" href=\"https:\/\/towardsdatascience.com\/the-vanishing-exploding-gradient-problem-in-deep-neural-networks-191358470c11\" target=\"_blank\" rel=\"noopener\">vanishing or exploding gradients<\/a>. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of&nbsp;<em class=\"ly\">n<\/em>&nbsp;hidden layers will multiply&nbsp;<em class=\"ly\">n<\/em>&nbsp;derivatives together.<\/p>\n<p id=\"e38f\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">When the derivatives are large, the gradient will increase exponentially as we propagate through the model until it eventually explodes. In contrast, when the derivatives are small, the gradient will decrease exponentially as we propagate through the model until it eventually vanishes. Both scenarios lead to the same end (in different ways): the network failing to learn well.<\/p>\n<p id=\"3c7c\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">A partial solution to this problem is to be more careful when randomly initializing the weights of the network. This procedure is called&nbsp;<strong class=\"bm lz\">weight initialization<\/strong>. The end goal of weight initialization is to prevent gradients from exploding or vanishing, thus, permitting a slightly more effective optimization process. It doesn\u2019t entirely solve the vanishing\/exploding gradient problem, but it\u2019s an important design choice and helps a lot when building deep neural networks.<\/p>\n<p id=\"76dd\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Let\u2019s consider the&nbsp;<code class=\"fp ma mb mc md b\">make_circles<\/code>&nbsp;dataset from scikit learn:<\/p>\n<pre class=\"ko kp kq kr gx me bs mf mg dz md\"><span id=\"0723\" class=\"ga mh mi iy md b dm mj mk l ml mm\" data-selectable-paragraph=\"\"><strong class=\"md iz\">import<\/strong> numpy <strong class=\"md iz\">as <\/strong>np\n<strong class=\"md iz\">import<\/strong> pandas <strong class=\"md iz\">as <\/strong>pd\n<strong class=\"md iz\">import <\/strong>matplotlib.pyplot <strong class=\"md iz\">as <\/strong>plt\n<strong class=\"md iz\">from<\/strong> sklearn.datasets <strong class=\"md iz\">import<\/strong> make_circles\n<strong class=\"md iz\">from<\/strong> sklearn.model_selection <strong class=\"md iz\">import<\/strong> train_test_split<\/span><span id=\"5533\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\"><strong class=\"md iz\">import <\/strong>tensorflow <strong class=\"md iz\">as <\/strong>tf\n<strong class=\"md iz\">from <\/strong>mlxtend.plotting <strong class=\"md iz\">import <\/strong>plot_decision_regions<\/span><span id=\"c6d1\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\"><strong class=\"md iz\">import <\/strong>warnings\nwarnings.filterwarnings(\"ignore\")<\/span><span id=\"be21\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\"><strong class=\"md iz\"><em class=\"ly\"># load data<\/em><\/strong>\nX_train, y_train = make_circles(n_samples=<em class=\"ly\">10000<\/em>, noise=<em class=\"ly\">.05<\/em>)\nX_test, y_test = make_circles(n_samples=<em class=\"ly\">100<\/em>, noise=<em class=\"ly\">.05<\/em>)<\/span><span id=\"ef9b\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\"><strong class=\"md iz\"><em class=\"ly\"># visualize data<\/em><\/strong>\nplt.subplots(figsize=(<em class=\"ly\">8, 5<\/em>))\nplt.scatter(X_train[<em class=\"ly\">:, 0<\/em>], X_train[<em class=\"ly\">:, 1<\/em>], c=y_train, cmap=plt.cm.Spectral)<\/span><span id=\"05d1\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\">plt.show()<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/439\/1*FYFHTFqcdLGdLuxsYU995w.png\" alt=\"\" width=\"439\" height=\"272\"><\/figure><div class=\"gl gm mo\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*FYFHTFqcdLGdLuxsYU995w.webp 640w, https:\/\/miro.medium.com\/max\/720\/1*FYFHTFqcdLGdLuxsYU995w.webp 720w, https:\/\/miro.medium.com\/max\/750\/1*FYFHTFqcdLGdLuxsYU995w.webp 750w, https:\/\/miro.medium.com\/max\/786\/1*FYFHTFqcdLGdLuxsYU995w.webp 786w, https:\/\/miro.medium.com\/max\/828\/1*FYFHTFqcdLGdLuxsYU995w.webp 828w, https:\/\/miro.medium.com\/max\/1100\/1*FYFHTFqcdLGdLuxsYU995w.webp 1100w, https:\/\/miro.medium.com\/max\/878\/1*FYFHTFqcdLGdLuxsYU995w.webp 878w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 439px\"><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*FYFHTFqcdLGdLuxsYU995w.png 640w, https:\/\/miro.medium.com\/max\/720\/1*FYFHTFqcdLGdLuxsYU995w.png 720w, https:\/\/miro.medium.com\/max\/750\/1*FYFHTFqcdLGdLuxsYU995w.png 750w, https:\/\/miro.medium.com\/max\/786\/1*FYFHTFqcdLGdLuxsYU995w.png 786w, https:\/\/miro.medium.com\/max\/828\/1*FYFHTFqcdLGdLuxsYU995w.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*FYFHTFqcdLGdLuxsYU995w.png 1100w, https:\/\/miro.medium.com\/max\/878\/1*FYFHTFqcdLGdLuxsYU995w.png 878w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 439px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"6f5b\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">We are going to build a 4 layer neural network and test how it performs with different weight initializations. The full code for this notebook can be found via&nbsp;<a class=\"au lc\" href=\"https:\/\/github.com\/kurtispykes\/deep-learning-examples\/blob\/main\/neural_network_initializations.ipynb\" target=\"_blank\" rel=\"noopener ugc nofollow\">Github<\/a>.<\/p>\n<h1 id=\"60a9\" class=\"mp mi iy bm mq mr ms mt mu mv mw mx my ke mz kf na kh nb ki nc kk nd kl ne nf ga\" data-selectable-paragraph=\"\">Zero Initialization<\/h1>\n<p id=\"4178\" class=\"pw-post-body-paragraph ld le iy bm b lf ng jz lh li nh kc lk ll ni ln lo lp nj lr ls lt nk lv lw lx ir ga\" data-selectable-paragraph=\"\">Zero initialization is exactly how it sounds: all weights are initialized as 0. In such scenarios, the neurons in each layer would learn exactly the same thing. Our network would fail to break symmetry as a consequence.<\/p>\n<p id=\"d6b9\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">We can take this a step further and say: \u201cwhenever a constant value is used to initialize a model&#8217;s weights, you can expect it to perform poorly.\u201d This is because the outputs of all hidden units in the model would have the same influence on the cost, thus, resulting in identical gradients.<\/p>\n<p id=\"ad11\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Here\u2019s the results of our model when we initialized weights as zero.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*rwrU07HVHeb-iF7bUsM4cg.png\" alt=\"\" width=\"700\" height=\"295\"><\/figure><div class=\"gl gm nl\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*rwrU07HVHeb-iF7bUsM4cg.webp 640w, https:\/\/miro.medium.com\/max\/720\/1*rwrU07HVHeb-iF7bUsM4cg.webp 720w, https:\/\/miro.medium.com\/max\/750\/1*rwrU07HVHeb-iF7bUsM4cg.webp 750w, https:\/\/miro.medium.com\/max\/786\/1*rwrU07HVHeb-iF7bUsM4cg.webp 786w, https:\/\/miro.medium.com\/max\/828\/1*rwrU07HVHeb-iF7bUsM4cg.webp 828w, https:\/\/miro.medium.com\/max\/1100\/1*rwrU07HVHeb-iF7bUsM4cg.webp 1100w, https:\/\/miro.medium.com\/max\/1400\/1*rwrU07HVHeb-iF7bUsM4cg.webp 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*rwrU07HVHeb-iF7bUsM4cg.png 640w, https:\/\/miro.medium.com\/max\/720\/1*rwrU07HVHeb-iF7bUsM4cg.png 720w, https:\/\/miro.medium.com\/max\/750\/1*rwrU07HVHeb-iF7bUsM4cg.png 750w, https:\/\/miro.medium.com\/max\/786\/1*rwrU07HVHeb-iF7bUsM4cg.png 786w, https:\/\/miro.medium.com\/max\/828\/1*rwrU07HVHeb-iF7bUsM4cg.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*rwrU07HVHeb-iF7bUsM4cg.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*rwrU07HVHeb-iF7bUsM4cg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div><figcaption class=\"kz bl gn gl gm la lb bm b bn bo cn\" data-selectable-paragraph=\"\">Visualizing cost per epoch and decision boundary for a model with weights initialized at zero.<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"o dx nm nn id no\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<blockquote class=\"nt\"><p id=\"1343\" class=\"nu nv iy bm nw nx ny nz oa ob oc lx cn\" data-selectable-paragraph=\"\">Tips, tricks, and innovation \u2014 all in your inbox each week. Subscribe to the&nbsp;<a class=\"au lc\" href=\"https:\/\/info.comet.ml\/newsletter-signup\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet newsletter<\/a>&nbsp;for the latest industry news gathered by our team of experts.<\/p><\/blockquote>\n<\/div>\n\n\n\n<div class=\"o dx nm nn id no\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\" style=\"text-align: left;\">\n<h1 id=\"4d8b\" class=\"mp mi iy bm mq mr od mt mu mv oe mx my ke of kf na kh og ki nc kk oh kl ne nf ga\" data-selectable-paragraph=\"\">Random Initialization<\/h1>\n<p id=\"2505\" class=\"pw-post-body-paragraph ld le iy bm b lf ng jz lh li nh kc lk ll ni ln lo lp nj lr ls lt nk lv lw lx ir ga\" data-selectable-paragraph=\"\">Assigning weights random values breaks the symmetry. It\u2019s better than zero but it\u2019s not without some issues. Weights must not be initialized too high or too low because:<\/p>\n<ul class=\"\">\n<li id=\"540d\" class=\"oi oj iy bm b lf lg li lj ll ok lp ol lt om lx on oo op oq ga\" data-selectable-paragraph=\"\">If weights are initialized with large values then each matrix multiplication will result in a significantly larger value. Thus, applying a sigmoid activation function to the linear equation would result in a value close to 1 which slows down the rate of learning.<\/li>\n<li id=\"14cb\" class=\"oi oj iy bm b lf or li os ll ot lp ou lt ov lx on oo op oq ga\" data-selectable-paragraph=\"\">If weights are initialized with small values then each matrix multiplication will result in significantly smaller values. Thus, applying a sigmoid activation function to the linear equation would result in a value close to 0 which slows down the rate of learning.<\/li>\n<\/ul>\n<p id=\"030c\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">When initialized poorly, random initialization can lead to vanishing\/exploding gradients. Let\u2019s set the weights of our model to large values and see what happens:<\/p>\n<pre class=\"ko kp kq kr gx me bs mf mg dz md\"><span id=\"26cd\" class=\"ga mh mi iy md b dm mj mk l ml mm\" data-selectable-paragraph=\"\"><strong class=\"md iz\">def <\/strong>random_normal_init(<strong class=\"md iz\"><em class=\"ly\">shape<\/em><\/strong>, <strong class=\"md iz\"><em class=\"ly\">dtype<\/em><\/strong>=<strong class=\"md iz\">None<\/strong>):\n<strong class=\"md iz\">return <\/strong>tf.random.normal(shape) * <em class=\"ly\">1000<\/em><\/span><span id=\"2ea2\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\">model = tf.keras.Sequential([\ntf.keras.layers.Dense(5, activation=\"tanh\", input_shape=(X_train.shape[1],), kernel_initializer=random_normal_init),\ntf.keras.layers.Dense(10, activation=\"tanh\"),\ntf.keras.layers.Dense(2, activation=\"tanh\"),\ntf.keras.layers.Dense(1, activation=\"sigmoid\")\n])<\/span><span id=\"9ddd\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\">model.<strong class=\"md iz\">compile<\/strong>(optimizer=tf.keras.optimizers.SGD(learning_rate=<em class=\"ly\">0.01<\/em>),loss=\"mse\", metrics=[\"accuracy\"])<\/span><span id=\"03dc\" class=\"ga mh mi iy md b dm mn mk l ml mm\" data-selectable-paragraph=\"\">history = model.<strong class=\"md iz\">fit<\/strong>(X_train, y_train, epochs=<em class=\"ly\">100<\/em>, validation_data=[X_test, y_test])<\/span><\/pre>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*inzsHbatKx9wPjVv6yfcaA.png\" alt=\"\" width=\"700\" height=\"293\"><\/figure><div class=\"gl gm ow\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*inzsHbatKx9wPjVv6yfcaA.webp 640w, https:\/\/miro.medium.com\/max\/720\/1*inzsHbatKx9wPjVv6yfcaA.webp 720w, https:\/\/miro.medium.com\/max\/750\/1*inzsHbatKx9wPjVv6yfcaA.webp 750w, https:\/\/miro.medium.com\/max\/786\/1*inzsHbatKx9wPjVv6yfcaA.webp 786w, https:\/\/miro.medium.com\/max\/828\/1*inzsHbatKx9wPjVv6yfcaA.webp 828w, https:\/\/miro.medium.com\/max\/1100\/1*inzsHbatKx9wPjVv6yfcaA.webp 1100w, https:\/\/miro.medium.com\/max\/1400\/1*inzsHbatKx9wPjVv6yfcaA.webp 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*inzsHbatKx9wPjVv6yfcaA.png 640w, https:\/\/miro.medium.com\/max\/720\/1*inzsHbatKx9wPjVv6yfcaA.png 720w, https:\/\/miro.medium.com\/max\/750\/1*inzsHbatKx9wPjVv6yfcaA.png 750w, https:\/\/miro.medium.com\/max\/786\/1*inzsHbatKx9wPjVv6yfcaA.png 786w, https:\/\/miro.medium.com\/max\/828\/1*inzsHbatKx9wPjVv6yfcaA.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*inzsHbatKx9wPjVv6yfcaA.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*inzsHbatKx9wPjVv6yfcaA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Visualizing cost per epoch and decision boundary for a model with weights initialized from a random normal distribution.<\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"b949\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">If we trained the algorithm longer, the results may have improved but initializing weights with large random values slows down optimization.<\/p>\n<h1 id=\"1ea0\" class=\"mp mi iy bm mq mr ms mt mu mv mw mx my ke mz kf na kh nb ki nc kk nd kl ne nf ga\" data-selectable-paragraph=\"\">Xavier Initialization<\/h1>\n<p id=\"521e\" class=\"pw-post-body-paragraph ld le iy bm b lf ng jz lh li nh kc lk ll ni ln lo lp nj lr ls lt nk lv lw lx ir ga\" data-selectable-paragraph=\"\">Xavier initialization also referred to as Glorot initialization, is a heuristic used to initialize weights. It\u2019s become the standard way of initializing weights when the nodes either use tanh or sigmoid activation functions. It was first proposed in a paper by&nbsp;<a class=\"au lc\" href=\"http:\/\/proceedings.mlr.press\/v9\/glorot10a.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">Xavier Glorot and Yoshua Bengio in 2010<\/a>. The goal of Xavier initialization is to initialize weights such that the variance across layers in the network is the same. This helps to prevent the gradients from exploding or vanishing.<\/p>\n<p id=\"2c08\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Here\u2019s how our algorithm performed when we initialized our weights using Xavier uniform initialization:<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*8KIpdkYQZ-HYSummNALN2Q.png\" alt=\"\" width=\"700\" height=\"296\"><\/figure><div class=\"gl gm ox\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*8KIpdkYQZ-HYSummNALN2Q.webp 640w, https:\/\/miro.medium.com\/max\/720\/1*8KIpdkYQZ-HYSummNALN2Q.webp 720w, https:\/\/miro.medium.com\/max\/750\/1*8KIpdkYQZ-HYSummNALN2Q.webp 750w, https:\/\/miro.medium.com\/max\/786\/1*8KIpdkYQZ-HYSummNALN2Q.webp 786w, https:\/\/miro.medium.com\/max\/828\/1*8KIpdkYQZ-HYSummNALN2Q.webp 828w, https:\/\/miro.medium.com\/max\/1100\/1*8KIpdkYQZ-HYSummNALN2Q.webp 1100w, https:\/\/miro.medium.com\/max\/1400\/1*8KIpdkYQZ-HYSummNALN2Q.webp 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*8KIpdkYQZ-HYSummNALN2Q.png 640w, https:\/\/miro.medium.com\/max\/720\/1*8KIpdkYQZ-HYSummNALN2Q.png 720w, https:\/\/miro.medium.com\/max\/750\/1*8KIpdkYQZ-HYSummNALN2Q.png 750w, https:\/\/miro.medium.com\/max\/786\/1*8KIpdkYQZ-HYSummNALN2Q.png 786w, https:\/\/miro.medium.com\/max\/828\/1*8KIpdkYQZ-HYSummNALN2Q.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*8KIpdkYQZ-HYSummNALN2Q.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*8KIpdkYQZ-HYSummNALN2Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Visualizing cost per epoch and decision boundary for a model with weights initialized with Xavier\/Glorot uniform distribution.<\/picture><\/div>\n<\/div>\n<\/figure>\n<h1 id=\"7bec\" class=\"mp mi iy bm mq mr ms mt mu mv mw mx my ke mz kf na kh nb ki nc kk nd kl ne nf ga\" data-selectable-paragraph=\"\">He Initialization<\/h1>\n<p id=\"77d4\" class=\"pw-post-body-paragraph ld le iy bm b lf ng jz lh li nh kc lk ll ni ln lo lp nj lr ls lt nk lv lw lx ir ga\" data-selectable-paragraph=\"\">He\/Kaiming initialization is another heuristic used to initialize weights. It\u2019s an approach that takes into account the non-linearity of activation functions, such as ReLU activations [<strong class=\"bm lz\">Source<\/strong>:&nbsp;<a class=\"au lc\" href=\"https:\/\/paperswithcode.com\/method\/he-initialization#:~:text=Kaiming%20Initialization%2C%20or%20He%20Initialization,functions%2C%20such%20as%20ReLU%20activations.&amp;text=2%20%2F%20n%20l%20)-,That%20is%2C%20a%20zero%2Dcentered%20Gaussian%20with%20standard%20deviation%20of,Biases%20are%20initialized%20at%20.\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paperswithcode<\/a>] \u2014meaning it\u2019s the recommended weight initialization when using ReLU activations. The technique was first presented in a 2015 paper by&nbsp;<a class=\"au lc\" href=\"https:\/\/arxiv.org\/abs\/1502.01852\" target=\"_blank\" rel=\"noopener ugc nofollow\">He et al.<\/a>&nbsp;\u2014 it\u2019s very similar to Xavier initialization except it uses a different scaling factor for the weights.<\/p>\n<p id=\"f0f6\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">Here\u2019s how our model performed with He uniform initialization and ReLU activations.<\/p>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<div class=\"kt ku do kv ce kw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*WUVprxLvXF0P2_4fAGPUoQ.png\" alt=\"\" width=\"700\" height=\"294\"><\/figure><div class=\"gl gm oy\" style=\"text-align: center;\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 640w, https:\/\/miro.medium.com\/max\/720\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 720w, https:\/\/miro.medium.com\/max\/750\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 750w, https:\/\/miro.medium.com\/max\/786\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 786w, https:\/\/miro.medium.com\/max\/828\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 828w, https:\/\/miro.medium.com\/max\/1100\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 1100w, https:\/\/miro.medium.com\/max\/1400\/1*WUVprxLvXF0P2_4fAGPUoQ.webp 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*WUVprxLvXF0P2_4fAGPUoQ.png 640w, https:\/\/miro.medium.com\/max\/720\/1*WUVprxLvXF0P2_4fAGPUoQ.png 720w, https:\/\/miro.medium.com\/max\/750\/1*WUVprxLvXF0P2_4fAGPUoQ.png 750w, https:\/\/miro.medium.com\/max\/786\/1*WUVprxLvXF0P2_4fAGPUoQ.png 786w, https:\/\/miro.medium.com\/max\/828\/1*WUVprxLvXF0P2_4fAGPUoQ.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*WUVprxLvXF0P2_4fAGPUoQ.png 1100w, https:\/\/miro.medium.com\/max\/1400\/1*WUVprxLvXF0P2_4fAGPUoQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\">Visualizing cost per epoch and decision boundary for a model with weights initialized with He\/Kaiming uniform distribution.<\/picture><\/div>\n<\/div>\n<\/figure>\n<\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<p id=\"1aaf\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\">In this article we covered weight initialization and why it\u2019s important. We used the&nbsp;<code class=\"fp ma mb mc md b\">make_circles<\/code>&nbsp;dataset from scikit-learn, and built a 4 layer neural network to demonstrate the impact on our models when we initialize them with: zero, random, Xavier, and He initialization. I highly suggest the reader takes a moment to read through the linked research papers to go more in-depth on the inner workings being He and Xavier initialization.<\/p>\n<p id=\"d8d7\" class=\"pw-post-body-paragraph ld le iy bm b lf lg jz lh li lj kc lk ll lm ln lo lp lq lr ls lt lu lv lw lx ir ga\" data-selectable-paragraph=\"\"><em class=\"ly\">Thanks for Reading!<\/em><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by&nbsp;Graphic Node&nbsp;on&nbsp;Unsplash Very deep neural networks can suffer from either&nbsp;vanishing or exploding gradients. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of&nbsp;n&nbsp;hidden layers will multiply&nbsp;n&nbsp;derivatives together. When the derivatives are large, the gradient will increase exponentially [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[138],"class_list":["post-4560","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Weight Initialization In Deep Neural Networks - Comet<\/title>\n<meta name=\"description\" content=\"Very deep neural networks can suffer from either\u00a0vanishing or exploding gradients. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of\u00a0n\u00a0hidden layers will multiply\u00a0n\u00a0derivatives together.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Weight Initialization In Deep Neural Networks\" \/>\n<meta property=\"og:description\" content=\"Very deep neural networks can suffer from either\u00a0vanishing or exploding gradients. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of\u00a0n\u00a0hidden layers will multiply\u00a0n\u00a0derivatives together.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-11T01:53:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09\" \/>\n<meta name=\"author\" content=\"Kurtis Pykes\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kurtis Pykes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Weight Initialization In Deep Neural Networks - Comet","description":"Very deep neural networks can suffer from either\u00a0vanishing or exploding gradients. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of\u00a0n\u00a0hidden layers will multiply\u00a0n\u00a0derivatives together.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/","og_locale":"en_US","og_type":"article","og_title":"Weight Initialization In Deep Neural Networks","og_description":"Very deep neural networks can suffer from either\u00a0vanishing or exploding gradients. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of\u00a0n\u00a0hidden layers will multiply\u00a0n\u00a0derivatives together.","og_url":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-11T01:53:10+00:00","article_modified_time":"2025-04-24T17:16:27+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09","type":"","width":"","height":""}],"author":"Kurtis Pykes","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Kurtis Pykes","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"Weight Initialization In Deep Neural Networks","datePublished":"2022-11-11T01:53:10+00:00","dateModified":"2025-04-24T17:16:27+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/"},"wordCount":860,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/","url":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/","name":"Weight Initialization In Deep Neural Networks - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09","datePublished":"2022-11-11T01:53:10+00:00","dateModified":"2025-04-24T17:16:27+00:00","description":"Very deep neural networks can suffer from either\u00a0vanishing or exploding gradients. This is because the main operation used to compute the derivatives as we propagate through a neural network model is matrix multiplication \u2014 thus, a network of\u00a0n\u00a0hidden layers will multiply\u00a0n\u00a0derivatives together.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09","contentUrl":"https:\/\/miro.medium.com\/max\/700\/0*lpWKW-y0hZo6fJ09"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/weight-initialization-in-deep-neural-networks\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Weight Initialization In Deep Neural Networks"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4560","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4560"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4560\/revisions"}],"predecessor-version":[{"id":15646,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4560\/revisions\/15646"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4560"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4560"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4560"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4560"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}