{"id":5945,"date":"2023-06-14T08:02:48","date_gmt":"2023-06-14T16:02:48","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=5945"},"modified":"2025-04-24T17:15:30","modified_gmt":"2025-04-24T17:15:30","slug":"building-a-neural-network-from-scratch-using-python-part-1","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/","title":{"rendered":"Building a Neural Network From Scratch Using Python (Part 1)"},"content":{"rendered":"\n<link rel=\"\u201ccanonical\u201d\" href=\"\u201chttps:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\u201d\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5946\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp\" alt=\"\" width=\"953\" height=\"571\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-300x180.webp 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-768x461.webp 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1536x922.webp 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw.webp 1920w\" sizes=\"auto, (max-width: 953px) 100vw, 953px\" \/><\/figure><p data-selectable-paragraph=\"\"><\/p>\n<p id=\"5ee2\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Artificial intelligence (AI) is a buzzword you see pretty much everywhere around you, even when you\u2019re not looking. It has completely dominated tech media, newsrooms, and is even credited with the success of many modern applications.<\/p>\n<p id=\"fbe3\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">But does it really work, or is it just hype? Truth is, it does. While there might be some hype around its capabilities, AI has been demonstrated both in research and industry to work really well for a variety of tasks and use cases.<\/p>\n<p id=\"f6ed\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">There exist many techniques to make computers learn intelligently, but neural networks are one of the most popular and effective methods, most notably in complex tasks like image recognition, language translation, audio transcription, and so on.<\/p>\n<p id=\"1bbd\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In this two-part series, I\u2019ll walk you through building a neural network from scratch. While you won\u2019t be building one from scratch in a real-world setting, it is advisable to work through this process at least once in your lifetime as an AI engineer. This can really help you better understand how neural networks work.<\/p>\n<p id=\"dba4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In this first article, you\u2019ll learn:<\/p>\n<p id=\"37e3\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">\u2014What Artificial Intelligence is<\/p>\n<p id=\"2861\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">\u2014What Deep Learning is<\/p>\n<p id=\"1c55\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">\u2014What a Neural Network Entails<\/p>\n<p id=\"ea68\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">\u2014Why Neural Networks are Popular<\/p>\n<p id=\"ed2e\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">\u2014Build a Neural Network From Scratch<\/p>\n<ul class=\"\">\n<li id=\"c32b\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Why use Python for AI?<\/li>\n<li id=\"ba55\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Understanding the Problem<\/li>\n<li id=\"4e90\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">The Layers of a Neural Network<\/li>\n<li id=\"a84a\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">The Weights and Biases<\/li>\n<li id=\"5b4b\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">The Activation Function<\/li>\n<li id=\"f0bb\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">The Loss Function<\/li>\n<li id=\"e74e\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Going Forward: Forward Propagation<\/li>\n<li id=\"b9cc\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">A Step Backward: Backpropagation<\/li>\n<li id=\"8365\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Optimization and Training of the Neural Network<\/li>\n<li id=\"0619\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Making Predictions<\/li>\n<li id=\"f7c3\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Putting It All Together<\/li>\n<\/ul>\n<p id=\"a873\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">And once you\u2019ve had a chance to work through this tutorial, head on over to part 2, where we actually train and test the network we build.<\/p>\n<div class=\"ob oc od oe of og\">\n<div class=\"oh ab ik\">\n<div class=\"op l\">\n<div class=\"oq l or os ot op ou ml og\"><span style=\"color: var(--wpex-text-2); font-family: var(--wpex-body-font-family, var(--wpex-font-sans)); font-size: var(--wpex-body-font-size, 13px);\">\ud83d\udc49\ud83c\udffd Full code in Google Colab <\/span><a class=\"af mu\" style=\"font-family: var(--wpex-body-font-family, var(--wpex-font-sans)); font-size: var(--wpex-body-font-size, 13px);\" href=\"https:\/\/colab.research.google.com\/drive\/14cDKLDqt24sWWM4fw_qnkIp6wVKlyMBG?usp=sharing\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h1 id=\"208a\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po pp pq pr ps pt pu pv pw px py pz qa bj\" data-selectable-paragraph=\"\">What Is AI<\/h1>\n<p id=\"8a36\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Artificial intelligence (AI) is an umbrella term used to describe the intelligence shown by machines (computers), including their ability to mimic humans in areas such as learning and problem-solving. This means with AI, you can automate how you think, reason, and make decisions. As such, you can teach a computer to do what humans do, without explicitly programming it.<\/p>\n<p id=\"9024\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Despite this simple explanation above, this isn\u2019t as easy as it sounds. And while many scientists and researchers have been able to teach machines to act like humans in areas like computer vision and natural language processing, there\u2019s still serious work to be done before we can have efficient and fully functioning AI systems.<\/p>\n<p id=\"8c60\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">I\u2019m guessing that\u2019s the reason you\u2019re here-to learn how neural networks and AI work in general, and how you can use them to automate your own processes, build customized user experiences, and more.<\/p>\n<p id=\"0fe6\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">AI is broad and has numerous subfields, of which machine learning is a part. Machine learning itself has numerous techniques, of which neural networks are one (albeit a very successful technique).<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:680\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg\" alt=\"\" width=\"680\" height=\"361\"><\/figure><div class=\"mq mr cg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1360\/format:webp\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 1360w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 680px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1360\/1*Rk7gQpY_4HY5v6cmO58HhQ.jpeg 1360w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 680px\" data-testid=\"og\"><\/picture><\/div><figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">Difference between AI, ML, and DL (<a class=\"af mu\" href=\"https:\/\/rapidminer.com\/glossary\/machine-learning\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Image source<\/a>)<\/figcaption><\/figure>\n<h1 id=\"57aa\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po qg pq pr ps qh pu pv pw qi py pz qa bj\" data-selectable-paragraph=\"\">What Is Deep Learning<\/h1>\n<p id=\"5ce5\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Deep learning\u2013a machine learning technique\u2013is an efficient way of learning that relies on big data, where features that can help a machine map an input to an output is automatically extracted from layers of \u201cneurons\u201d.<\/p>\n<p id=\"df41\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Deep learning is the main technology behind:<\/p>\n<ul class=\"\">\n<li id=\"4430\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Driverless cars<\/li>\n<li id=\"93e5\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Large-scale recommendation engines like Spotify, YouTube, and Amazon<\/li>\n<li id=\"0cdd\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Language translation services like Google Translate<\/li>\n<li id=\"f3f9\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Chatbots like Siri and Google assistant.<\/li>\n<\/ul>\n<p id=\"5f97\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A neural network is a type of deep learning architecture, and it\u2019s our primary focus in this tutorial. Some specific architectures for deep neural networks include convolutional neural networks (CNN) for computer vision use cases, recurrent neural networks (RNN) for language and time series modeling, and others like generative adversarial networks (GANs) for generative computer vision use cases.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"f584\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The future of machine learning is on the edge. Subscribe to the Fritz AI Newsletter to discover the possibilities and benefits of embedding ML models inside mobile apps.<\/p>\n<h1 id=\"b8d7\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po qg pq pr ps qh pu pv pw qi py pz qa bj\" data-selectable-paragraph=\"\">What Is a Neural Network<\/h1>\n<p id=\"17d6\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Neural networks are composed of simple building blocks called neurons. While many people try to draw correlations between a neural network neuron and biological neurons, I will simply state the obvious here: \u201cA neuron is a mathematical function that takes data as input, performs a transformation on them, and produces an output\u201d.<\/p>\n<p id=\"e8ea\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">This means that neurons can represent any mathematical function; however, in neural networks, we typically use non-linear functions.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*NZc0TcMCzpgVZXvUdEkqvA.png\" alt=\"\" width=\"700\" height=\"600\"><\/figure><div class=\"mq mr qr\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*NZc0TcMCzpgVZXvUdEkqvA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*NZc0TcMCzpgVZXvUdEkqvA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*NZc0TcMCzpgVZXvUdEkqvA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*NZc0TcMCzpgVZXvUdEkqvA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*NZc0TcMCzpgVZXvUdEkqvA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*NZc0TcMCzpgVZXvUdEkqvA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*NZc0TcMCzpgVZXvUdEkqvA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*NZc0TcMCzpgVZXvUdEkqvA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">A single neuron in a network<\/figcaption>\n<\/figure>\n<p id=\"9a82\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Looking at the neuron above, you can see that it\u2019s composed of two main parts: the summation and the activation function. A neuron takes data (x\u2081, x\u2082, x\u2083) as input, multiplies each with a specific weight (w\u2081, w\u2082, w\u2083), and then passes the result to a nonlinear function called the activation function to produce an output.<\/p>\n<p id=\"8807\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A neural network combines multiple neurons by stacking them vertically\/horizontally to create a network of neurons-hence the name \u201cneural network\u201d. A simple one-neuron network is called a perceptron and is the simplest network ever.<\/p>\n<p id=\"2bc6\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Another important concept I\u2019ll explain in later sections of this tutorial is how a neural network actually learns the weights it assigns to each input feature. In neural nets, the weights are everything. If you know the correct weight, you can easily output correct predictions.<\/p>\n<p id=\"0040\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In summary, what machine learning and deep learning really boils down to is actually trying to find the right weights that generalize to any input.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"b98a\" class=\"pf pg fo be ph pi rb go pk pl rc gr pn po rd pq pr ps re pu pv pw rf py pz qa bj\" data-selectable-paragraph=\"\">Why Are Neural Networks Popular?<\/h1>\n<p id=\"4b62\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">In the previous section, I introduced neural networks and briefly explained the building blocks. Now we\u2019ll explore why neural networks are popular today.<\/p>\n<p id=\"c297\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Neural networks have been around for a really long time\u2014a few major problems with them, and reasons, why people didn\u2019t use them before now, was due to the fact that:<\/p>\n<ol class=\"\">\n<li id=\"de10\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np rg nu nv bj\" data-selectable-paragraph=\"\">They were notoriously difficult to train, in the sense that it can be difficult to get the right weights that generalize to new inputs.<\/li>\n<li id=\"1600\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np rg nu nv bj\" data-selectable-paragraph=\"\">They need huge amounts of data.<\/li>\n<li id=\"f8d6\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np rg nu nv bj\" data-selectable-paragraph=\"\">Computing power was still low and expensive.<\/li>\n<\/ol>\n<p id=\"dd45\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">When these barriers were overcome, neural nets became cool again, and numerous applications sprung up.<\/p>\n<p id=\"c14b\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Neural networks are also very popular now because of their effectiveness on a wide range of tasks. They can automatically extract features from unstructured data like texts, images, and sounds, and deep learning has greatly reduced the time spent to manually create features.<\/p>\n<p id=\"b679\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">To illustrate this, I\u2019ll tell you a short story about Google Translate. In the early days of Google Translate, thousands of engineers, language experts, and computer scientists had to work all day to manually extract and create features from texts.<\/p>\n<p id=\"5ed7\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">These manual features had to be fed into machine learning models. Even with this time consuming and expensive task, the performance of these systems was nothing close to human-like. But when Geoff Hilton&#8217;s team showed that a neural network could be trained using a technique called backpropagation, Google switched from manually engineering features to using deep neural nets, and this greatly improved performance.<\/p>\n<p id=\"ea95\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">This anecdote shows that with enough data and compute power, neural networks can do better than other machine learning algorithms\u2014hence, their rising popularity.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"7996\" class=\"pf pg fo be ph pi rb go pk pl rc gr pn po rd pq pr ps re pu pv pw rf py pz qa bj\" data-selectable-paragraph=\"\">Building a Neural Network From Scratch<\/h1>\n<p id=\"0297\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Now that you\u2019ve gotten a brief introduction to AI, deep learning, and neural networks, including some reasons why they work well, you\u2019re going to build your very own neural net from scratch. To do this, you\u2019ll use Python and its efficient scientific library Numpy.<\/p>\n<h2 id=\"9c24\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">Why Python for AI?<\/h2>\n<p id=\"9588\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Python is a high-level, interpreted, and general-purpose language that can be used for a wide variety of tasks. It\u2019s one of the easiest languages to learn, and that makes it the go-to for new programmers. Python is popular among AI engineers\u2014in fact, the majority of AI applications are built with Python and Python-related tools. There are many reasons for this, some of which include:<\/p>\n<ol class=\"\">\n<li id=\"ff58\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np rg nu nv bj\" data-selectable-paragraph=\"\">There is a large ecosystem of pre-built libraries for scientific computation. Libraries like NumPy, SciPy, and Pandas make doing scientific calculations easy and quick, as the majority of these libraries are well-optimized for common ML and DL tasks.<\/li>\n<li id=\"9dbb\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np rg nu nv bj\" data-selectable-paragraph=\"\">Python is platform-independent and can be run on almost all devices. This means Python is easily compatible across platforms and can be deployed almost anywhere.<\/li>\n<li id=\"e03a\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np rg nu nv bj\" data-selectable-paragraph=\"\">Python has a helpful and supportive community built around it, and this community provides tons of guides and tutorials for working with the language. You can rest assured that most problems you encounter have already been solved.<\/li>\n<\/ol>\n<p id=\"c630\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">I love the comic below that shows a flying programmer *<em class=\"ry\">winks<\/em>. It depicts why Python is easy to learn, with numerous libraries that you can import and use for almost any task\u2014including antigravity \ud83d\ude02!<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:518\/1*TmwjhHeBOr1TNde_UbVMFQ.png\" alt=\"\" width=\"518\" height=\"588\"><\/figure><div class=\"mq mr rz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1036\/format:webp\/1*TmwjhHeBOr1TNde_UbVMFQ.png 1036w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 518px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*TmwjhHeBOr1TNde_UbVMFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*TmwjhHeBOr1TNde_UbVMFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*TmwjhHeBOr1TNde_UbVMFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*TmwjhHeBOr1TNde_UbVMFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*TmwjhHeBOr1TNde_UbVMFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*TmwjhHeBOr1TNde_UbVMFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1036\/1*TmwjhHeBOr1TNde_UbVMFQ.png 1036w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 518px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">The flying programmer (<a class=\"af mu\" href=\"https:\/\/xkcd.com\/353\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Image source<\/a>)<\/figcaption>\n<\/figure>\n<p id=\"e384\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Before you start flying, it is essential you\u2019ve properly set up your machine learning environment. If not, you should visit <a class=\"af mu\" href=\"https:\/\/realpython.com\/python-windows-machine-learning-setup\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">this page<\/a> first before moving onto the next section.<\/p>\n<h2 id=\"6813\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">What Are You Trying to Solve?<\/h2>\n<p id=\"77fa\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Before you start writing code, let\u2019s talk about the problem you\u2019re going to solve, as a more complete understanding of the problem will help you form the solution. In this tutorial, you\u2019re are going to create a neural network that predicts if a person will have heart disease or not. You\u2019ll use a heart disease dataset from the UCL data repository. You can download it <a class=\"af mu\" href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Statlog+%28Heart%29\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*x_8Ktepu-zevdmkdM4ueng.png\" alt=\"\" width=\"700\" height=\"365\"><\/figure><div class=\"mq mr sa\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*x_8Ktepu-zevdmkdM4ueng.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*x_8Ktepu-zevdmkdM4ueng.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*x_8Ktepu-zevdmkdM4ueng.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*x_8Ktepu-zevdmkdM4ueng.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*x_8Ktepu-zevdmkdM4ueng.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*x_8Ktepu-zevdmkdM4ueng.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*x_8Ktepu-zevdmkdM4ueng.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*x_8Ktepu-zevdmkdM4ueng.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">UCL heart disease dataset page<\/figcaption>\n<\/figure>\n<p id=\"eaee\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">On the dataset page, click on <strong class=\"be sb\">Data Folder<\/strong> and download the <strong class=\"be sb\">heart.dat<\/strong> file. This comes in a .dat file format. Create a new directory where your Jupyter Notebook and Data will live. Then, copy the heart.dat file to the folder.<\/p>\n<p id=\"44d2\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Next, you can create a new notebook and add the following lines of code:<\/p>\n<pre>#prepare data downloaded from UCL\n\nimport csv\nimport pandas as pd\n\n# add header names\nheaders =  ['age', 'sex','chest_pain','resting_blood_pressure',\n        'serum_cholestoral', 'fasting_blood_sugar', 'resting_ecg_results',\n        'max_heart_rate_achieved', 'exercise_induced_angina', 'oldpeak',\"slope of the peak\",\n        'num_of_major_vessels','thal', 'heart_disease']\n\nheart_df = pd.read_csv('heart.dat', sep=' ', names=headers)<\/pre>\n<p id=\"cc77\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In the code block above, you first set the header, which is the column names for the data set. You can get these names from the dataset description file also in the data page. Notice the <strong class=\"be sb\"><em class=\"ry\">sep<\/em><\/strong> parameters passed to Pandas read function? this tells pandas that the data is separated by spaces and not the default commas.<\/p>\n<p id=\"b548\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In the next code block, you\u2019ll print out the head of the data:<\/p>\n<pre class=\"mg mh mi mj mk sf sg sh si ax sj bj\"><span id=\"f16f\" class=\"rh pg fo sg b ia sk sl l iq sm\" data-selectable-paragraph=\"\">heart_df.head()<\/span><\/pre>\n<\/div>\n<\/div>\n<div class=\"mf bg\">\n<figure class=\"mg mh mi mj mk mf bg paragraph-image\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:2882\/format:webp\/1*HzzcN5QvmgRE8uaKCCQvUg.png 2882w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 1441px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*HzzcN5QvmgRE8uaKCCQvUg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*HzzcN5QvmgRE8uaKCCQvUg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*HzzcN5QvmgRE8uaKCCQvUg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*HzzcN5QvmgRE8uaKCCQvUg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*HzzcN5QvmgRE8uaKCCQvUg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*HzzcN5QvmgRE8uaKCCQvUg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:2882\/1*HzzcN5QvmgRE8uaKCCQvUg.png 2882w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 1441px\" data-testid=\"og\"><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1441\/1*HzzcN5QvmgRE8uaKCCQvUg.png\" alt=\"\" width=\"1441\" height=\"317\"><\/picture>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">First ten rows of the dataset<\/figcaption>\n<\/figure>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"60b5\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">From the head of the data, you can see the features present, and you can begin to imagine the kind of analysis you\u2019ll need to perform on the dataset. Next, print out the shape of the data:<\/p>\n<pre class=\"mg mh mi mj mk sf sg sh si ax sj bj\"><span id=\"1cf5\" class=\"rh pg fo sg b ia sk sl l iq sm\" data-selectable-paragraph=\"\">heart_df.shape<\/span><\/pre>\n<p id=\"85d4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">(270, 14)<\/p>\n<p id=\"2aae\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">There are 270 observations. This means that your neural network will have an input data of shape 270 x 13, excluding the target variable (<code class=\"cw sn so sp sg b\">heart_disease<\/code>). The features present in the dataset are:<\/p>\n<ul class=\"\">\n<li id=\"20ab\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np nt nu nv bj\" data-selectable-paragraph=\"\">age<\/li>\n<li id=\"8f03\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">sex<\/li>\n<li id=\"08dd\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">chest pain type (4 values)<\/li>\n<li id=\"79ad\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">resting blood pressure<\/li>\n<li id=\"15d6\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">serum cholesterol in mg\/dl<\/li>\n<li id=\"a0e3\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">fasting blood sugar &gt; 120 mg\/dl<\/li>\n<li id=\"e92f\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">resting electrocardiographic results (values 0,1,2)<\/li>\n<li id=\"0b2d\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">maximum heart rate achieved<\/li>\n<li id=\"961a\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">exercise-induced angina<\/li>\n<li id=\"c3b5\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">oldpeak (ST depression induced by exercise relative to rest)<\/li>\n<li id=\"5360\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">the slope of the peak exercise ST segment<\/li>\n<li id=\"2eb7\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">number of major vessels (0\u20133) colored by fluoroscopy<\/li>\n<li id=\"effe\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">thal (3 = normal; 6 = fixed defect; 7 = reversible defect)<\/li>\n<li id=\"e05a\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">heart_disease: absence (1) or presence (2) of heart disease<\/li>\n<\/ul>\n<p id=\"7d72\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Next, you can check for missing values and also the data types. A Neural Network expects all features to be numeric and not contain missing values.<\/p>\n<pre class=\"mg mh mi mj mk sf sg sh si ax sj bj\"><span id=\"d9d0\" class=\"rh pg fo sg b ia sk sl l iq sm\" data-selectable-paragraph=\"\">heart_df.isna().sum()<\/span><\/pre>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:326\/1*f3wihhgnvMMeMsxA_apntQ.png\" alt=\"\" width=\"326\" height=\"279\"><\/figure><div class=\"mq mr sq\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:652\/format:webp\/1*f3wihhgnvMMeMsxA_apntQ.png 652w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 326px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*f3wihhgnvMMeMsxA_apntQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*f3wihhgnvMMeMsxA_apntQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*f3wihhgnvMMeMsxA_apntQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*f3wihhgnvMMeMsxA_apntQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*f3wihhgnvMMeMsxA_apntQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*f3wihhgnvMMeMsxA_apntQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:652\/1*f3wihhgnvMMeMsxA_apntQ.png 652w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 326px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<pre class=\"mg mh mi mj mk sf sg sh si ax sj bj\"><span id=\"4c20\" class=\"rh pg fo sg b ia sk sl l iq sm\" data-selectable-paragraph=\"\">heart_df.dtypes<\/span><\/pre>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:373\/1*9_5UaQWQTmnsEX2AFHgtHA.png\" alt=\"\" width=\"373\" height=\"271\"><\/figure><div class=\"mq mr sr\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:746\/format:webp\/1*9_5UaQWQTmnsEX2AFHgtHA.png 746w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 373px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*9_5UaQWQTmnsEX2AFHgtHA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*9_5UaQWQTmnsEX2AFHgtHA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*9_5UaQWQTmnsEX2AFHgtHA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*9_5UaQWQTmnsEX2AFHgtHA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*9_5UaQWQTmnsEX2AFHgtHA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*9_5UaQWQTmnsEX2AFHgtHA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:746\/1*9_5UaQWQTmnsEX2AFHgtHA.png 746w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 373px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"4176\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">There are no missing values in the dataset, and all features are numeric. Next, you\u2019ll separate the target from the data, split into train and test set, and then standardize the data.<\/p>\n<pre>import numpy as np\nimport warnings\nwarnings.filterwarnings(\"ignore\") #suppress warnings\nimport matplotlib.pyplot as plt\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\n\n#convert imput to numpy arrays\nX = heart_df.drop(columns=['heart_disease'])\n\n#replace target class with 0 and 1\n#1 means \"have heart disease\" and 0 means \"do not have heart disease\"\nheart_df['heart_disease'] = heart_df['heart_disease'].replace(1, 0)\nheart_df['heart_disease'] = heart_df['heart_disease'].replace(2, 1)\n\ny_label = heart_df['heart_disease'].values.reshape(X.shape[0], 1)\n\n#split data into train and test set\nXtrain, Xtest, ytrain, ytest = train_test_split(X, y_label, test_size=0.2, random_state=2)\n\n#standardize the dataset\nsc = StandardScaler()\nsc.fit(Xtrain)\nXtrain = sc.transform(Xtrain)\nXtest = sc.transform(Xtest)\n\nprint(f\"Shape of train set is {Xtrain.shape}\")\nprint(f\"Shape of test set is {Xtest.shape}\")\nprint(f\"Shape of train label is {ytrain.shape}\")\nprint(f\"Shape of test labels is {ytest.shape}\")<\/pre>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:346\/1*qoDtKQH-GuJ9Et5qWovCeg.png\" alt=\"\" width=\"346\" height=\"84\"><\/figure><div class=\"mq mr ss\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:692\/format:webp\/1*qoDtKQH-GuJ9Et5qWovCeg.png 692w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 346px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*qoDtKQH-GuJ9Et5qWovCeg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*qoDtKQH-GuJ9Et5qWovCeg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*qoDtKQH-GuJ9Et5qWovCeg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*qoDtKQH-GuJ9Et5qWovCeg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*qoDtKQH-GuJ9Et5qWovCeg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*qoDtKQH-GuJ9Et5qWovCeg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:692\/1*qoDtKQH-GuJ9Et5qWovCeg.png 692w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 346px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"c3ff\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In the code block above, first, you dropped the target from the training dataset, and the replace the classes with 0 and 1. Notice you reshaped the <code class=\"cw sn so sp sg b\">y_label<\/code> to a 1-D array. This is important when you start performing dot products. Next, you used the handy <code class=\"cw sn so sp sg b\">train_test_split<\/code> function from sklearn to split the data into train and test set, with the test set taking 20 percent of the data. Finally, you standardized the dataset using the StandardScaler module of sklearn.<\/p>\n<h2 id=\"3831\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">The Layers of a Neural Network<\/h2>\n<p id=\"5dfb\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Now that you have downloaded and prepared the dataset, let\u2019s start building the neural network to make predictions. To do that, you first, need to understand the concept of <strong class=\"be sb\">layers<\/strong>.<\/p>\n<p id=\"adce\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Remember when I said a neural network stacks multiple neurons together to build really large and complex mathematical functions? Well, the official name for it is a layer. The layer is a collection of nodes at different stages of computation in a neural network. Each node acts as a neuron and performs calculations on the data passed to it. Look at the illustration of a 3-layer neural network below:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg\" alt=\"\" width=\"700\" height=\"344\"><\/figure><div class=\"mq mr st\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Z3zHoX1nhK6Rsmd4yNPdsg.jpeg 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">A 3 layer neural network<\/figcaption>\n<\/figure>\n<p id=\"6219\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Yeah I know, you see four layers\u2014but in deep learning, you don\u2019t count the first layer. The first layer is called the input layer, and the number of nodes will depend on the number of features present in your dataset. In our case, it will be 13 nodes because we have 13 features.<\/p>\n<p id=\"28f4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The final layer of the neural network is called the output layer, and the number depends on what you\u2019re trying to predict. For regression and binary classification tasks, you can use a single node; while for multi-class problems, you\u2019ll use multiple nodes, depending on the number of classes.<\/p>\n<p id=\"2ca9\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In this article, you\u2019ll use a single node for your final layer, because you&#8217;re working on a <a class=\"af mu\" href=\"https:\/\/heartbeat.comet.ml\/binary-classification-using-keras-in-r-ef3d42202aaa\" target=\"_blank\" rel=\"noopener ugc nofollow\">binary classification<\/a> task.<\/p>\n<p id=\"0d87\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The layers between the input and the final layer is where the magic happens\u2014 these are called the hidden layers. The hidden layers can be as deep or wide as you want, and while a deeper network is better, the computational time also increases as you go deeper.<\/p>\n<p id=\"24cd\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In order to keep things relatively simple, you\u2019re going to design and code a 2-layer neural network. Below is a preview of the architecture:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:586\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png\" alt=\"\" width=\"586\" height=\"586\"><\/figure><div class=\"mq mr su\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1172\/format:webp\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 1172w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 586px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1172\/1*tDZCZM0g5oJxyZ0JfqRJDQ.png 1172w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 586px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">2 layer neural network<\/figcaption>\n<\/figure>\n<p id=\"ed4a\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The neural net above will have one hidden layer and a final output layer. The input layer will have 13 nodes because we have 13 features, excluding the target. The hidden layer can accept any number of nodes, but you\u2019ll start with 8, and the final layer, which makes the predictions, will have 1 node. Next, let\u2019s talk about weights and biases that each layer must-have.<\/p>\n<h1 id=\"b813\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po qg pq pr ps qh pu pv pw qi py pz qa bj\" data-selectable-paragraph=\"\">The Weights and Biases<\/h1>\n<p id=\"2fcf\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Weights and biases are the learnable parameters that help a neural network correctly learn a function. Think of weights as a measure of how sure you are that a feature contributes to a prediction and the bias as a base value that your predictions must start from.<\/p>\n<p id=\"d019\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">I\u2019ll give you an illustration.<\/p>\n<p id=\"394d\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Assume you\u2019re a machine learning model, and you want to predict if a person is rich or not, and you have been given the following clues to help you make that decision:<\/p>\n<ul class=\"\">\n<li id=\"2341\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Age of the person<\/li>\n<li id=\"463e\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Height of the person<\/li>\n<li id=\"4ab7\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Salary of the person<\/li>\n<\/ul>\n<p id=\"079a\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The clues above are what we call features in machine learning, and what you want to predict is called the target\/label\/ground truth. The label can be one of two classes (rich, not rich)\u2014in other words, binary classification.<\/p>\n<p id=\"0ff2\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Basically, what you want to do is combine the features in such a way that they help you more accurately predict the outcome.<\/p>\n<p id=\"d7ee\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">y(rich, not rich) = Age + Height + salary + [base]<\/em><\/p>\n<p id=\"3854\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Assuming we set a base salary of $3000, and Person 1 has the following features; age = 18, height = 5.6ft, salary = $2000, then you\u2019ll calculate the richness as follows:<\/p>\n<p id=\"bece\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">y(rich, not rich) = <em class=\"ry\">18 + 5.6 + 2000 + 3000 = ~5024<\/em><\/p>\n<p id=\"36e8\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">For this example, we might define a threshold for richness as any value greater than $40,000. Judging by these criteria, you can conclude that person 1 is not rich. Let\u2019s look at another example.<\/p>\n<p id=\"99d9\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Person 2 has the following features; age = 26, height = 5.2ft and salary = $50,000. Your prediction will calculated as:<\/p>\n<p id=\"489e\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">y(rich, not rich) = <em class=\"ry\">26 + 5.2 + 50000 + 3000 = ~53031<\/em><\/p>\n<p id=\"ea93\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Then, by the threshold earlier stated, person 2 is rich.<\/p>\n<p id=\"b3d7\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">It\u2019s obvious that some clues are more important than others. Can you guess which one is the most important? Yes! Salary. This is, perhaps unsurprisingly, an important factor that indicates whether a person is rich or not.<\/p>\n<p id=\"ff72\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Using this idea, you can assign importance to the features. For instance, you can assign weights as follows:<\/p>\n<p id=\"86f6\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">y(rich,not rich) = <em class=\"ry\">(2 * Age) + (1 * Height) + (8 * Salary) + base<\/em><\/p>\n<p id=\"8911\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Intuitively, we assign a higher value to the salary feature.<\/p>\n<p id=\"1ef5\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The importance of the value can be any number but must be representative of scale.<\/p>\n<p id=\"8b7b\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">You might be wondering what the base value of 3000 is and why we add it to the predictions. This value is called the bias. It is a base value that every prediction must-have, even when nothing else is given.<\/p>\n<p id=\"14db\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now, if you make a prediction for person 1 and 2 again, you\u2019ll have the following:<\/p>\n<p id=\"5eb5\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Person 1: <em class=\"ry\">(2 * 18) + (1 * 5.6) (8 * 2000) + 3000 = ~19041<\/em> (still poor)<\/p>\n<p id=\"fa0d\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Person 2: <em class=\"ry\">(2 * 26) + (1 * 5.2) + (8 * 5000) + 3000 = ~43057<\/em> (Still rich)<\/p>\n<p id=\"6d27\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">What if a person has no value for age, height, and salary, then your prediction will be?<\/p>\n<p id=\"dc29\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">y(rich, not rich) = <em class=\"ry\">(2 * 0) + (1 * 0) (8 * 0) + 3000 = 3000<\/em><\/p>\n<p id=\"8e11\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now you see where the bias value comes in.<\/p>\n<blockquote class=\"sv sw sx\"><p id=\"5f5c\" class=\"mv mw ry be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np fh bj\" data-selectable-paragraph=\"\">What you should take away from the examples above is the fact that importance values assigned to features are called weights, and the base value is called the bias.<\/p><\/blockquote>\n<p id=\"e0f8\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A machine learning model uses lots of examples to learn the correct weights and bias to assign to each feature in a dataset to help it correctly predict outputs.<\/p>\n<p id=\"3bb9\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Back to our proposed solution. You now know that every feature in our dataset must be assigned a weight and that after doing a weighted sum, you add a bias term.<\/p>\n<p id=\"4b9d\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In the code block below, you\u2019ll create your neural network class and initialize those weights and biases:<\/p>\n<pre>class NeuralNet():\n    '''\n    A two layer neural network\n    '''\n\n    def __init__(self, layers=[13,8,1], learning_rate=0.001, iterations=100):\n        self.params = {}\n        self.learning_rate = learning_rate\n        self.iterations = iterations\n        self.loss = []\n        self.sample_size = None\n        self.layers = layers\n        self.X = None\n        self.y = None\n\n    def init_weights(self):\n        '''\n        Initialize the weights from a random normal distribution\n        '''\n        np.random.seed(1) # Seed the random number generator\n        self.params[\"W1\"] = np.random.randn(self.layers[0], self.layers[1])\n        self.params['b1']  =np.random.randn(self.layers[1],)\n        self.params['W2'] = np.random.randn(self.layers[1],self.layers[2])\n        self.params['b2'] = np.random.randn(self.layers[2],)<\/pre>\n<p id=\"4710\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">First, you create a neural network class, and then during initialization, you created some variables to hold intermediate calculations. The argument <code class=\"cw sn so sp sg b\">layers<\/code> is a list that stores your network\u2019s architecture. You can see that it accepts 13 input features, uses 8 nodes in the hidden layer (as we noted earlier), and finally uses 1 node in the output layer. We\u2019ll talk about the other parameters such as the learning rate, sample size and iterations in later sections.<\/p>\n<p id=\"7593\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Moving on to the next code section, you created a function (<strong class=\"be sb\"><em class=\"ry\">init_weights<\/em><\/strong>) to initialize the weights and biases as random numbers. These weights are initialized from a uniform random distribution and saved to a dictionary called <strong class=\"be sb\">params<\/strong>.<\/p>\n<p id=\"5abf\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">You\u2019ll notice that there are two weight and bias arrays. The first weight array (<strong class=\"be sb\"><em class=\"ry\">W1<\/em><\/strong>) will have dimensions of 13 by 8\u2014this is because you have 13 input features and 8 hidden nodes, while the first bias (<strong class=\"be sb\"><em class=\"ry\">b1<\/em><\/strong>) will be a vector of size 8 because you have 8 hidden nodes.<\/p>\n<p id=\"0c5a\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The second weight array (<strong class=\"be sb\"><em class=\"ry\">W2<\/em><\/strong>) will be a 10 by 1-dimensional array because you have 10 hidden nodes and 1 output node, and finally, the second bias (<strong class=\"be sb\"><em class=\"ry\">b2<\/em><\/strong>) will be a vector of size because you have just 1 output.<\/p>\n<p id=\"5092\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">I\u2019m guessing you\u2019re seeing a pattern here. That is, if you have a neural network with the following architecture [20,30,2], then you know you\u2019ll have the following dimensions for your weights and biases:<\/p>\n<p id=\"e96a\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">W1 = (20,30)<\/em> , <em class=\"ry\">b1 = (30,)<\/em><\/p>\n<p id=\"ba21\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">W2 = (30, 2)<\/em>, <em class=\"ry\">b2 = (2,)<\/em><\/p>\n<p id=\"e867\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">And if you have a 3 layer architecture like [5,7,8,2], then you know you\u2019ll have 3 weights and 3 biases with the following shapes:<\/p>\n<p id=\"6366\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">W1 = (5,7)<\/em>, <em class=\"ry\">b1 = (7,)<\/em><\/p>\n<p id=\"3e66\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">W2 = (7,8)<\/em>, <em class=\"ry\">b2 = (8,)<\/em><\/p>\n<p id=\"d533\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">W3 = (8,2)<\/em>, <em class=\"ry\">b3 = (2,)<\/em><\/p>\n<p id=\"9407\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">So what will the dimensions be for a neural network with this architecture [20, 23, 2]?<\/p>\n<h2 id=\"10a7\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">The Activation Function<\/h2>\n<p id=\"c9f5\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Now that you\u2019ve initialized the weights and biases, let\u2019s talk about activation functions. Activations are the nonlinear computations done in each node of a Neural Network. Remember when I told you that each node performs some mathematical computation? Well, that computation happens in two phases.<\/p>\n<p id=\"2295\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">First, you do a weighted sum of the input and the weights, add the biases, and then pass the result through an activation function. I\u2019ll explain why we do that below.<\/p>\n<p id=\"ef13\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">An activation function is what makes a neural network capable of learning complex non-linear functions. Non-linear functions are difficult for traditional machine learning algorithms like logistic and linear regression to learn. The activation function is what makes a neural network capable of understanding these functions.<\/p>\n<p id=\"4178\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">There are many types of activation functions used in deep learning\u2014some popular ones are Sigmoid, ReLU, tanh, Leaky ReLU, and so on. Each activation function has its pros and cons, but the ReLU function has been shown to perform very well, so in this article, you\u2019ll use the ReLU function.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*ZafDv3VUm60Eh10OeJu1vw.png\" alt=\"\" width=\"700\" height=\"352\"><\/figure><div class=\"mq mr sy\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*ZafDv3VUm60Eh10OeJu1vw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*ZafDv3VUm60Eh10OeJu1vw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*ZafDv3VUm60Eh10OeJu1vw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*ZafDv3VUm60Eh10OeJu1vw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*ZafDv3VUm60Eh10OeJu1vw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*ZafDv3VUm60Eh10OeJu1vw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*ZafDv3VUm60Eh10OeJu1vw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*ZafDv3VUm60Eh10OeJu1vw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">Different activation functions used in deep learning (<a class=\"af mu\" href=\"https:\/\/medium.com\/@shrutijadon10104776\/survey-on-activation-functions-for-deep-learning-9689331ba092\" rel=\"noopener\">Image source<\/a>)<\/figcaption>\n<\/figure>\n<p id=\"a9b3\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The activation function is computed by each node in the hidden layers of a neural network. This means you\u2019ll have to pass the weighted sums through the ReLU function.<\/p>\n<p id=\"b0c4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">But what is ReLU?<\/p>\n<p id=\"6c45\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">ReLU (Rectified Linear Unit) is a simple function that compares a value with zero. That is, it will return the value passed to it if it is greater than zero; otherwise, it returns zero.<\/p>\n<p id=\"2099\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The code for the ReLU function is shown below:<\/p>\n<pre>def relu(self,Z):\n        '''\n        The ReLu activation function is to performs a threshold\n        operation to each input element where values less\n        than zero are set to zero.\n        '''\n        return np.maximum(0,Z)<\/pre>\n<p id=\"1224\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">You\u2019ll add this inside the <code class=\"cw sn so sp sg b\">NeuralNetwork<\/code> class. This function performs an array-wise ReLU because you\u2019ll be dealing mainly with arrays, not single values.<\/p>\n<p id=\"a29b\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In summary, the hidden layer receives values from the input layer, calculates a weighted sum, adds the bias term, and then passes each result through an activation function\u2014in our case a ReLU. The result from the ReLU is then passed to the output layer, where another weighted sum is performed using the second weights and biases. But then instead of passing the result through another activation function, it is passed through what I like to call the output function.<\/p>\n<p id=\"7e07\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The output function will depend on what you\u2019re trying to predict. You can use a sigmoid function when you have a two-class problem (binary classification), and you can use a function called softmax for multi-class problems.<\/p>\n<p id=\"c7d7\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In this tutorial, you\u2019ll be using a sigmoid function for the output layer. This is because you\u2019re predicting one of two classes.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*OUOB_YF41M-O4GgZH_F2rw.png\" alt=\"\" width=\"640\" height=\"480\"><\/figure><div class=\"mq mr sz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/format:webp\/1*OUOB_YF41M-O4GgZH_F2rw.png 1280w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*OUOB_YF41M-O4GgZH_F2rw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*OUOB_YF41M-O4GgZH_F2rw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*OUOB_YF41M-O4GgZH_F2rw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*OUOB_YF41M-O4GgZH_F2rw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*OUOB_YF41M-O4GgZH_F2rw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*OUOB_YF41M-O4GgZH_F2rw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/1*OUOB_YF41M-O4GgZH_F2rw.png 1280w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">The sigmoid function (<a class=\"af mu\" href=\"https:\/\/towardsdatascience.com\/introduction-to-logistic-regression-66248243c148\" target=\"_blank\" rel=\"noopener\">Image source<\/a>)<\/figcaption>\n<\/figure>\n<p id=\"b336\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The sigmoid function takes a real number and squashes it to a value between 0 and 1. In other words, it outputs a probability score for every real number. This is useful for the task at hand because you don\u2019t just want your model to predict a yes (1) or No (0)\u2014you want it to predict probabilities that can help you measure how sure it is of its predictions.<\/p>\n<p id=\"29a6\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Let\u2019s add the Sigmoid function to our <code class=\"cw sn so sp sg b\">NeuralNetwork<\/code> class:<\/p>\n<pre>def sigmoid(self,Z):\n        '''\n        The sigmoid function takes in real numbers in any range and\n        squashes it to a real-valued output between 0 and 1.\n        '''\n        return 1\/(1+np.exp(-Z))<\/pre>\n<p id=\"644e\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">You\u2019ll use the Numpy exponential function to code the sigmoid function. This makes it possible to perform the operation for arrays instead of single values. Also, Numpy implementation is faster than pure Python, as it\u2019s written in C.<\/p>\n<h2 id=\"148d\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">The Loss Function<\/h2>\n<p id=\"b391\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Next, let\u2019s talk about a neural network\u2019s loss function. The loss function is a way of measuring how good a model\u2019s prediction is so that it can adjust the weights and biases.<\/p>\n<p id=\"da05\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A loss function must be properly designed so that it can correctly penalize a model that is wrong and reward a model that is right. This means that you want the loss to tell you if a prediction made is far or close to the true prediction. The choice of the loss function is dependent on the task\u2014and for classification problems, you can use cross-entropy loss.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*e8qhWEz_8PZBLmF37JHfSA.png\" alt=\"\" width=\"700\" height=\"228\"><\/figure><div class=\"mq mr ta\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*e8qhWEz_8PZBLmF37JHfSA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*e8qhWEz_8PZBLmF37JHfSA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*e8qhWEz_8PZBLmF37JHfSA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*e8qhWEz_8PZBLmF37JHfSA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*e8qhWEz_8PZBLmF37JHfSA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*e8qhWEz_8PZBLmF37JHfSA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*e8qhWEz_8PZBLmF37JHfSA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*e8qhWEz_8PZBLmF37JHfSA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\">The Cross-Entropy loss<\/figcaption>\n<\/figure>\n<p id=\"d361\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Where C is the number of classes, y is the true value and y_hat is the predicted value.<\/p>\n<p id=\"6b65\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">For a binary classification task (i.e. C=2), the cross-entropy loss function becomes:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TkSDhSmzdiVWzaCfCGYgrA.png\" alt=\"\" width=\"700\" height=\"84\"><\/figure><div class=\"mq mr tb\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*TkSDhSmzdiVWzaCfCGYgrA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*TkSDhSmzdiVWzaCfCGYgrA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*TkSDhSmzdiVWzaCfCGYgrA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*TkSDhSmzdiVWzaCfCGYgrA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*TkSDhSmzdiVWzaCfCGYgrA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*TkSDhSmzdiVWzaCfCGYgrA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*TkSDhSmzdiVWzaCfCGYgrA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*TkSDhSmzdiVWzaCfCGYgrA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"67b4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now, let\u2019s put this in code:<\/p>\n<p id=\"c1b4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><strong class=\"be sb\">Update (05\u201302\u20132021):<\/strong><\/p>\n<blockquote class=\"sv sw sx\"><p id=\"ddea\" class=\"mv mw ry be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np fh bj\" data-selectable-paragraph=\"\">The loss function below has been updated to take into consideration 0 values. If our NN supplies 0 values to log, it will result in infinity, which will affect network training. So here, we compare the value, and if it is zero, we replace with an extremely small value (0.00000001)<\/p><\/blockquote>\n<pre>def eta(self, x):\n  ETA = 0.0000000001\n  return np.maximum(x, ETA)\n\ndef entropy_loss(self,y, yhat):\n    nsample = len(y)\n    yhat_inv = 1.0 - yhat\n    y_inv = 1.0 - y\n    yhat = self.eta(yhat) ## clips value to avoid NaNs in log\n    yhat_inv = self.eta(yhat_inv)\n    loss = -1\/nsample * (np.sum(np.multiply(np.log(yhat), y) + np.multiply((y_inv), np.log(yhat_inv))))\n    return loss<\/pre>\n<p id=\"0e20\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Notice the sum and the division by sample size in the code block above? This means you\u2019re considering the average loss with respect to all the inputs. That is, you\u2019re concerned about the combined loss from all the samples and not the individual losses.<\/p>\n<h1 id=\"dec5\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po qg pq pr ps qh pu pv pw qi py pz qa bj\" data-selectable-paragraph=\"\">Going Forward: Forward Propagation<\/h1>\n<p id=\"32d6\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Now that you have some basic building blocks for your neural network, you\u2019ll move to a very important part of the process called forward propagation.<\/p>\n<p id=\"117c\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Forward propagation is the name given to the series of computations performed by the neural network before a prediction is made. In your two-layer network, you\u2019ll perform the following computation for forward propagation:<\/p>\n<ul class=\"\">\n<li id=\"0672\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Compute the weighted sum between the input and the first layer&#8217;s weights and then add the bias: <strong class=\"be sb\">Z1 = (W1 * X) + b<\/strong><\/li>\n<li id=\"7805\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Pass the result through the ReLU activation function: <strong class=\"be sb\">A1 = Relu(Z1)<\/strong><\/li>\n<li id=\"d508\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Compute the weighted sum between the output (A1) of the previous step and the second layer&#8217;s weights\u2014also add the bias: <strong class=\"be sb\">Z2 = (W2 * A1) + b2<\/strong><\/li>\n<li id=\"256b\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Compute the <mark class=\"yw yx ao\">output function<\/mark> by passing the result through a sigmoid function: <strong class=\"be sb\">A2 = sigmoid(Z2)<\/strong><\/li>\n<li id=\"0c06\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">And finally, compute the loss between the predicted output and the true labels: <strong class=\"be sb\">loss(A2, Y)<\/strong><\/li>\n<\/ul>\n<p id=\"246d\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">And there, you have the forward propagation for your two-layer neural network. For a three-layer neural network, you\u2019d have to compute <strong class=\"be sb\">Z3<\/strong> and <strong class=\"be sb\">A2<\/strong>using <strong class=\"be sb\">W3<\/strong> and <strong class=\"be sb\">b3<\/strong> before the output layer.<\/p>\n<p id=\"7381\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now, let\u2019s put this in code. Remember to add the code to your <code class=\"cw sn so sp sg b\">NeuralNetwork<\/code> class:<\/p>\n<pre>    def forward_propagation(self):\n        '''\n        Performs the forward propagation\n        '''\n\n        Z1 = self.X.dot(self.params['W1']) + self.params['b1']\n        A1 = self.relu(Z1)\n        Z2 = A1.dot(self.params['W2']) + self.params['b2']\n        yhat = self.sigmoid(Z2)\n        loss = self.entropy_loss(self.y,yhat)\n\n        # save calculated parameters\n        self.params['Z1'] = Z1\n        self.params['Z2'] = Z2\n        self.params['A1'] = A1\n\n        return yhat,loss<\/pre>\n<p id=\"7cb7\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In the code cell above, first, you perform all the dot products and addition using the weights and biases you initialized earlier, calculate the loss by calling the <code class=\"cw sn so sp sg b\">entropy_loss<\/code> function, save the calculated parameters and finally return the predicted values and the loss. These values will be used during backpropagation.<\/p>\n<p id=\"0b1f\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Wheeew! That\u2019s a lot to take in, I\u2019m happy to inform you that you\u2019re halfway to the completion of your neural net. Take a moment to smile!<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"8703\" class=\"pf pg fo be ph pi rb go pk pl rc gr pn po rd pq pr ps re pu pv pw rf py pz qa bj\" data-selectable-paragraph=\"\">A Step Backward: Backpropagation<\/h1>\n<p id=\"0ec2\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Backpropagation is the name given to the process of training a neural network by updating its weights and bias.<\/p>\n<p id=\"19e3\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A neural network learns to predict the correct values by continuously trying different values for the weights and then comparing the losses. If the loss function decreases, then the current weight is better than the previous, or vice versa. This means that the neural net has to go through many training (forward propagation) and update (backpropagation) cycles in order to get the best weights and biases. This cycle is what we generally refer to as the training phase, and the process of searching for the right weights is called optimization.<\/p>\n<p id=\"a36c\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now the question is, how do you code a neural network to correctly adjust its weights with respect to the loss it calculates. Well, thanks to mathematics, we can use calculus to do this effectively. So the calculus you learnt in school is important after all \ud83d\ude09.<\/p>\n<h2 id=\"ce5c\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">A Primer on Calculus<\/h2>\n<p id=\"604a\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Calculus helps us understand how a change in one variable affects another variable. That is, you can use calculus to compute how much changing the weights\/bias affects the loss function. So basically, we use calculus to understand how much and in what direction to update weights and bias in order to decrease the loss.<\/p>\n<p id=\"c190\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Assume you have a function y = x\u00b2. This function is telling you that the value of y is 2 times the value of x. i.e. if x = 2, then y = 4, if x = 4, then y = 16.<\/p>\n<p id=\"0556\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now using calculus, you can calculate the relationship between y and x. We call this the derivative of y with respect to x.<\/p>\n<p id=\"2373\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">You can easily know the rate of change of many functions through special calculated derivatives. These derivatives are formulas that have been studied and can quickly be used to calculate complex derivatives. For instance, in the function y = x\u00b2, the derivative is <em class=\"ry\">2x<\/em>. This means the rate of change is 2 times the value of x. Now how was this calculated? According to calculus, you can calculate the derivative of a function of the form y = x\u207f using the formula:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:399\/1*e-2UxFikwE54jgj9SMftjw.png\" alt=\"\" width=\"399\" height=\"90\"><\/figure><div class=\"mq mr tc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:798\/format:webp\/1*e-2UxFikwE54jgj9SMftjw.png 798w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 399px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*e-2UxFikwE54jgj9SMftjw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*e-2UxFikwE54jgj9SMftjw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*e-2UxFikwE54jgj9SMftjw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*e-2UxFikwE54jgj9SMftjw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*e-2UxFikwE54jgj9SMftjw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*e-2UxFikwE54jgj9SMftjw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:798\/1*e-2UxFikwE54jgj9SMftjw.png 798w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 399px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"f4a4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">For example, if y = X\u2074, then n=4, so the derivative will be calculated as:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*PHI-OxBbaJu_ud9LVwsghQ.png\" alt=\"\" width=\"700\" height=\"81\"><\/figure><div class=\"mq mr td\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*PHI-OxBbaJu_ud9LVwsghQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*PHI-OxBbaJu_ud9LVwsghQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*PHI-OxBbaJu_ud9LVwsghQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*PHI-OxBbaJu_ud9LVwsghQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*PHI-OxBbaJu_ud9LVwsghQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*PHI-OxBbaJu_ud9LVwsghQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*PHI-OxBbaJu_ud9LVwsghQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*PHI-OxBbaJu_ud9LVwsghQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"b29b\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">What if you have special functions like sigmoid, ReLU, tanh, Sin, or a combination of multiple functions like 3x + 2x\u00b2\u2014how do you calculate the derivative? The good news is that most of these functions are built from a combination of smaller functions, so you can use a concept of chaining to aggregate the derivatives\u2014enter the <strong class=\"be sb\">chain rule<\/strong>!<\/p>\n<p id=\"3eb5\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">So if you have two functions that are dependent on each other\u2014say 2x\u00b2 + 3x\u2014 then the derivative becomes the addition of the individual derivatives:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*nZKw3M-Dsz9E9tiQJtosJg.png\" alt=\"\" width=\"700\" height=\"58\"><\/figure><div class=\"mq mr te\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*nZKw3M-Dsz9E9tiQJtosJg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*nZKw3M-Dsz9E9tiQJtosJg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*nZKw3M-Dsz9E9tiQJtosJg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*nZKw3M-Dsz9E9tiQJtosJg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*nZKw3M-Dsz9E9tiQJtosJg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*nZKw3M-Dsz9E9tiQJtosJg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*nZKw3M-Dsz9E9tiQJtosJg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*nZKw3M-Dsz9E9tiQJtosJg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"b430\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">But if two functions are multiplied, the computation differs. Here\u2019s an instance\u2014assuming you need to find the derivative of this function y = 2x\u2074 \u00d7 4x\u00b2, you can calculate the derivative as follows:<\/p>\n<p id=\"c956\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">First, assign an alphabet to each function; so let\u2019s say <strong class=\"be sb\">a<\/strong> is assigned the first part, i.e. a = 2x\u2074 and <strong class=\"be sb\">b<\/strong> is assigned the second part, i.e. b = 4x\u00b2. Then the derivative becomes:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:694\/1*KQJVQ9fpW47r_4uesb8z0w.png\" alt=\"\" width=\"694\" height=\"77\"><\/figure><div class=\"mq mr tf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1388\/format:webp\/1*KQJVQ9fpW47r_4uesb8z0w.png 1388w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 694px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*KQJVQ9fpW47r_4uesb8z0w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*KQJVQ9fpW47r_4uesb8z0w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*KQJVQ9fpW47r_4uesb8z0w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*KQJVQ9fpW47r_4uesb8z0w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*KQJVQ9fpW47r_4uesb8z0w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*KQJVQ9fpW47r_4uesb8z0w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1388\/1*KQJVQ9fpW47r_4uesb8z0w.png 1388w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 694px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"fb45\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">This means that for two functions multiplied together, you first take the derivative of the first part <strong class=\"be sb\">a<\/strong> (\u0394a), and multiply it with the second part <strong class=\"be sb\">b<\/strong>, then take the derivative of the second part <strong class=\"be sb\">b<\/strong> (\u0394b), and multiply it with the first part <strong class=\"be sb\">a<\/strong>, and finally, you sum the result. So, therefore, the derivative becomes:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*y7uOSAr--A1lo0kp43hATg.png\" alt=\"\" width=\"700\" height=\"66\"><\/figure><div class=\"mq mr tg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*y7uOSAr--A1lo0kp43hATg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*y7uOSAr--A1lo0kp43hATg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*y7uOSAr--A1lo0kp43hATg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*y7uOSAr--A1lo0kp43hATg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*y7uOSAr--A1lo0kp43hATg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*y7uOSAr--A1lo0kp43hATg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*y7uOSAr--A1lo0kp43hATg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*y7uOSAr--A1lo0kp43hATg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"39d9\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">and from laws of indices, this reduces to:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:266\/1*QXEJqMFMn5SKTJ3hkegCoQ.png\" alt=\"\" width=\"266\" height=\"71\"><\/figure><div class=\"mq mr th\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:532\/format:webp\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 532w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 266px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:532\/1*QXEJqMFMn5SKTJ3hkegCoQ.png 532w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 266px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"11d3\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now, I don\u2019t plan to show you all the derivatives available in calculus, but basically, you should know that most derivatives you\u2019ll be using while coding the backpropagation algorithm have already been computed, so you can just use the formulas.<\/p>\n<p id=\"8586\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A couple of great resources to learn more about derivatives here:<\/p>\n<figure class=\"mg mh mi mj mk mf\">\n<div class=\"sc is l eb\">\n<div class=\"ti se l\"><iframe loading=\"lazy\" class=\"ek n fc dx bg\" title=\"The Essence of Calculus, Chapter 1\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fvideoseries%3Flist%3DPLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DWUvTyaaNkzM&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FWUvTyaaNkzM%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube\" width=\"854\" height=\"480\" frameborder=\"0\" scrolling=\"no\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe><\/div>\n<\/div>\n<\/figure>\n<div class=\"ob oc od oe of og\">\n<div class=\"oh ab ik\">\n<div class=\"oi ab cn ca oj ok\"><\/div>\n<\/div>\n<\/div>\n<h2 id=\"66ac\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">Using Calculus in Backpropagation<\/h2>\n<p id=\"3bd4\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">After computing the output and loss in the forward propagation layer, you\u2019ll move to the backpropagation phase, where you calculate the derivatives backward, from the loss all the way up to the first weight and bias. To perform backpropagation in your neural network, you\u2019ll follow the steps listed below:<\/p>\n<p id=\"8cb2\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Starting from the last layer, calculate the derivative of the loss with respect to the output <strong class=\"be sb\"><em class=\"ry\">yhat<\/em><\/strong> as:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*KxTt7ULwEpjyzLRMC_9VYA.png\" alt=\"\" width=\"700\" height=\"132\"><\/figure><div class=\"mq mr tk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*KxTt7ULwEpjyzLRMC_9VYA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*KxTt7ULwEpjyzLRMC_9VYA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*KxTt7ULwEpjyzLRMC_9VYA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*KxTt7ULwEpjyzLRMC_9VYA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*KxTt7ULwEpjyzLRMC_9VYA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*KxTt7ULwEpjyzLRMC_9VYA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*KxTt7ULwEpjyzLRMC_9VYA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*KxTt7ULwEpjyzLRMC_9VYA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"da29\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">But how did you get to the loss? Well, you calculated the <strong class=\"be sb\">sigmoid(Z2)<\/strong>. Now, what is the derivative of the loss with respect to the <strong class=\"be sb\">sigmoid(Z2)<\/strong>?<\/p>\n<p id=\"e45f\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><strong class=\"be sb\">Sigmoid(Z2)<\/strong> is a combination of two functions, so you have to calculate two derivatives:<\/p>\n<p id=\"e964\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">First, calculate the derivative of sigmoid activation with respect to (wrt) the loss:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*xcQGbV1yz5lLFDOOG8f4ng.png\" alt=\"\" width=\"700\" height=\"36\"><\/figure><div class=\"mq mr tl\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*xcQGbV1yz5lLFDOOG8f4ng.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*xcQGbV1yz5lLFDOOG8f4ng.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*xcQGbV1yz5lLFDOOG8f4ng.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*xcQGbV1yz5lLFDOOG8f4ng.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*xcQGbV1yz5lLFDOOG8f4ng.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*xcQGbV1yz5lLFDOOG8f4ng.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*xcQGbV1yz5lLFDOOG8f4ng.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*xcQGbV1yz5lLFDOOG8f4ng.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"3c96\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Then, you calculate the derivative of the loss wrt <em class=\"ry\">Z2<\/em>:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*LBLGZrW1bxsfmM3hMVQCPw.png\" alt=\"\" width=\"700\" height=\"39\"><\/figure><div class=\"mq mr tm\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*LBLGZrW1bxsfmM3hMVQCPw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*LBLGZrW1bxsfmM3hMVQCPw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*LBLGZrW1bxsfmM3hMVQCPw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*LBLGZrW1bxsfmM3hMVQCPw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*LBLGZrW1bxsfmM3hMVQCPw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*LBLGZrW1bxsfmM3hMVQCPw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*LBLGZrW1bxsfmM3hMVQCPw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*LBLGZrW1bxsfmM3hMVQCPw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"9aaf\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now, how did you get Z2? You calculated a dot product between <strong class=\"be sb\">A1<\/strong> and <strong class=\"be sb\">W2<\/strong>, and added a bias <strong class=\"be sb\">b2<\/strong>. This means that you have to calculate the loss with respect to all these variables:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*VWOYY5a23-K086e30WmnRg.png\" alt=\"\" width=\"700\" height=\"208\"><\/figure><div class=\"mq mr tn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*VWOYY5a23-K086e30WmnRg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*VWOYY5a23-K086e30WmnRg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*VWOYY5a23-K086e30WmnRg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*VWOYY5a23-K086e30WmnRg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*VWOYY5a23-K086e30WmnRg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*VWOYY5a23-K086e30WmnRg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*VWOYY5a23-K086e30WmnRg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*VWOYY5a23-K086e30WmnRg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"9242\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">And how did you get to <strong class=\"be sb\">A1<\/strong>? You performed <strong class=\"be sb\">ReLU(Z1)<\/strong>. So you take the derivative of <strong class=\"be sb\">ReLU<\/strong> and <strong class=\"be sb\">Z1<\/strong> wrt to the loss as well. The derivative of ReLU is 1 if the input is greater than 1, and 0 otherwise.<\/p>\n<p id=\"dd8e\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">You\u2019ll create a function to compute this and call it <code class=\"cw sn so sp sg b\">dRelu<\/code>:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*LapuItl800Az6TnFqOG-Ig.png\" alt=\"\" width=\"700\" height=\"112\"><\/figure><div class=\"mq mr to\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*LapuItl800Az6TnFqOG-Ig.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*LapuItl800Az6TnFqOG-Ig.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*LapuItl800Az6TnFqOG-Ig.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*LapuItl800Az6TnFqOG-Ig.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*LapuItl800Az6TnFqOG-Ig.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*LapuItl800Az6TnFqOG-Ig.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*LapuItl800Az6TnFqOG-Ig.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*LapuItl800Az6TnFqOG-Ig.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"3e94\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Next, how did you get <strong class=\"be sb\">Z1<\/strong>? You computed the dot product between <strong class=\"be sb\">X<\/strong> and <strong class=\"be sb\">W1<\/strong>and added the bias <strong class=\"be sb\">b1<\/strong>. So you compute the derivative of all the variables involved, except the input <strong class=\"be sb\">X<\/strong>.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"qs qt eb qu bg qv\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*3MmpRy8PfcGOMUlW1nAceQ.png\" alt=\"\" width=\"700\" height=\"124\"><\/figure><div class=\"mq mr tp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*3MmpRy8PfcGOMUlW1nAceQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*3MmpRy8PfcGOMUlW1nAceQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*3MmpRy8PfcGOMUlW1nAceQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*3MmpRy8PfcGOMUlW1nAceQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*3MmpRy8PfcGOMUlW1nAceQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*3MmpRy8PfcGOMUlW1nAceQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*3MmpRy8PfcGOMUlW1nAceQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*3MmpRy8PfcGOMUlW1nAceQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<blockquote class=\"ov\"><p id=\"2171\" class=\"ow ox fo be oy oz pa pb pc pd pe np dv\" data-selectable-paragraph=\"\"><strong class=\"al\">Note:<\/strong> dl_wrt is read \u201cthe loss with respect to\u201d<\/p><\/blockquote>\n<p id=\"3e23\" class=\"pw-post-body-paragraph mv mw fo be b gm tq my mz gp tr nb nc nd ts nf ng nh tt nj nk nl tu nn no np fh bj\" data-selectable-paragraph=\"\">Pheeewww! You now have all your derivatives for the backpropagation algorithm. If you want a detailed overview of how these derivatives are calculated from scratch, <a class=\"af mu\" href=\"https:\/\/medium.com\/@pdquant\/all-the-backpropagation-derivatives-d5275f727f60\" rel=\"noopener\">this Medium post<\/a> is a great guide.<\/p>\n<p id=\"cd81\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Next, let\u2019s write the backpropagation code:<\/p>\n<pre>def back_propagation(self,yhat):\n        '''\n        Computes the derivatives and update weights and bias according.\n        '''\n        y_inv = 1 - self.y\n        yhat_inv = 1 - yhat\n\n        dl_wrt_yhat = np.divide(y_inv, self.eta(yhat_inv)) - np.divide(self.y, self.eta(yhat))\n        dl_wrt_sig = yhat * (yhat_inv)\n        dl_wrt_z2 = dl_wrt_yhat * dl_wrt_sig\n\n        dl_wrt_A1 = dl_wrt_z2.dot(self.params['W2'].T)\n        dl_wrt_w2 = self.params['A1'].T.dot(dl_wrt_z2)\n        dl_wrt_b2 = np.sum(dl_wrt_z2, axis=0, keepdims=True)\n\n        dl_wrt_z1 = dl_wrt_A1 * self.dRelu(self.params['Z1'])\n        dl_wrt_w1 = self.X.T.dot(dl_wrt_z1)\n        dl_wrt_b1 = np.sum(dl_wrt_z1, axis=0, keepdims=True)<\/pre>\n<p id=\"7422\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In the backpropagation function, first, you create a function to calculate the derivatives of the ReLU, then you calculate and save the derivative of every parameter with respect to the loss function.<\/p>\n<p id=\"7990\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Notice we use a common naming scheme (<code class=\"cw sn so sp sg b\">dl_wrt<\/code>). This helps keep your code clean and easy to read. Once you calculate these derivatives, you have to update your previous weights. That is the essence of computing derivatives\u2014you basically want to know how to update your weights in order to minimize the loss.<\/p>\n<h2 id=\"9ab3\" class=\"rh pg fo be ph ri rj rk pk rl rm rn pn nd ro rp rq nh rr rs rt nl ru rv rw rx bj\" data-selectable-paragraph=\"\">Optimization and Training of the Neural Network<\/h2>\n<p id=\"93ce\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">In the previous section, you used calculus to compute the derivatives of the weights and biases with respect to the loss. The model now knows how to change them. To automatically use this information to update the weights and biases, a neural network must perform hundreds, thousands, and even millions of forward and backward propagations. That is, in the training phase, the neural network must perform the following:<\/p>\n<ul class=\"\">\n<li id=\"99ba\" class=\"mv mw fo be b gm mx my mz gp na nb nc nq ne nf ng nr ni nj nk ns nm nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Forward propagation<\/li>\n<li id=\"c789\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Backpropagation<\/li>\n<li id=\"594e\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Weight updates with calculated gradients<\/li>\n<li id=\"c096\" class=\"mv mw fo be b gm nw my mz gp nx nb nc nq ny nf ng nr nz nj nk ns oa nn no np nt nu nv bj\" data-selectable-paragraph=\"\">Repeat<\/li>\n<\/ul>\n<p id=\"2883\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Let\u2019s write the code that updates the weights and biases. In your backpropagation function, add the following lines of codes:<\/p>\n<pre>#update the weights and bias\nself.params['W1'] = self.params['W1'] - self.learning_rate * dl_wrt_w1\nself.params['W2'] = self.params['W2'] - self.learning_rate * dl_wrt_w2\nself.params['b1'] = self.params['b1'] - self.learning_rate * dl_wrt_b1\nself.params['b2'] = self.params['b2'] - self.learning_rate * dl_wrt_b2<\/pre>\n<p id=\"7ada\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">What you\u2019re basically doing here is subtracting the derivative multiplied by a small value called the learning rate. The learning rate is a value that tells our neural network how big the update should be.<\/p>\n<p id=\"cb05\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Now that you\u2019ve added the lines of code to perform the updates, you\u2019ll create a new function called <code class=\"cw sn so sp sg b\">fit<\/code> that takes the input (X) and labels (Y) and calls the forward and backpropagation repeatedly for a specified number of iterations:<\/p>\n<pre>def fit(self, X, y):\n        '''\n        Trains the neural network using the specified data and labels\n        '''\n        self.X = X\n        self.y = y\n        self.init_weights() #initialize weights and bias\n\n\n        for i in range(self.iterations):\n            yhat, loss = self.forward_propagation()\n            self.back_propagation(yhat)\n            self.loss.append(loss)\n<\/pre>\n<p id=\"e933\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The <code class=\"cw sn so sp sg b\">fit<\/code> function takes 2 parameters: X(input dataset) and y (labels). First, it saves the train and target to the class variable and then initializes the weights and biases by calling the <code class=\"cw sn so sp sg b\">init_weights<\/code> function. Then, it loops through the specified number of iterations, performs forward and backpropagation, saves the loss.<\/p>\n<h1 id=\"28ce\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po qg pq pr ps qh pu pv pw qi py pz qa bj\" data-selectable-paragraph=\"\">Making Predictions<\/h1>\n<p id=\"92c5\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">To make predictions, you simply make a forward pass on the test data. That is, you use the saved weights and biases from the training phase. To make the process easier, you\u2019ll add a function to your <code class=\"cw sn so sp sg b\">NeuralNetwork<\/code> class called <code class=\"cw sn so sp sg b\">predict<\/code>:<\/p>\n<pre>def predict(self, X):\n        '''\n        Predicts on a test data\n        '''\n        Z1 = X.dot(self.params['W1']) + self.params['b1']\n        A1 = self.relu(Z1)\n        Z2 = A1.dot(self.params['W2']) + self.params['b2']\n        pred = self.sigmoid(Z2)\n        return np.round(pred)<\/pre>\n<p id=\"b676\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The function passes the data through the forward propagation layer and computes the prediction using the saved weights and biases. The predictions are probability values ranging from 0 to 1. In order to interpret these probabilities, you can either round up the values or use a threshold function. To keep things simple, we just rounded up the probabilities.<\/p>\n<h1 id=\"3498\" class=\"pf pg fo be ph pi pj go pk pl pm gr pn po qg pq pr ps qh pu pv pw qi py pz qa bj\" data-selectable-paragraph=\"\">Putting It Together<\/h1>\n<p id=\"52c0\" class=\"pw-post-body-paragraph mv mw fo be b gm qb my mz gp qc nb nc nd qd nf ng nh qe nj nk nl qf nn no np fh bj\" data-selectable-paragraph=\"\">Let\u2019s put all your code together:<\/p>\n<pre>class NeuralNet():\n    '''\n    A two layer neural network\n    '''\n\n    def __init__(self, layers=[13,8,1], learning_rate=0.001, iterations=100):\n        self.params = {}\n        self.learning_rate = learning_rate\n        self.iterations = iterations\n        self.loss = []\n        self.sample_size = None\n        self.layers = layers\n        self.X = None\n        self.y = None\n\n    def init_weights(self):\n        '''\n        Initialize the weights from a random normal distribution\n        '''\n        np.random.seed(1) # Seed the random number generator\n        self.params[\"W1\"] = np.random.randn(self.layers[0], self.layers[1])\n        self.params['b1']  =np.random.randn(self.layers[1],)\n        self.params['W2'] = np.random.randn(self.layers[1],self.layers[2])\n        self.params['b2'] = np.random.randn(self.layers[2],)\n\n    def relu(self,Z):\n        '''\n        The ReLu activation function is to performs a threshold\n        operation to each input element where values less\n        than zero are set to zero.\n        '''\n        return np.maximum(0,Z)\n\n    def dRelu(self, x):\n        x[x&lt;=0] = 0\n        x[x&gt;0] = 1\n        return x\n\n    def eta(self, x):\n      ETA = 0.0000000001\n      return np.maximum(x, ETA)\n\n\n    def sigmoid(self,Z):\n        '''\n        The sigmoid function takes in real numbers in any range and\n        squashes it to a real-valued output between 0 and 1.\n        '''\n        return 1\/(1+np.exp(-Z))\n\n    def entropy_loss(self,y, yhat):\n        nsample = len(y)\n        yhat_inv = 1.0 - yhat\n        y_inv = 1.0 - y\n        yhat = self.eta(yhat) ## clips value to avoid NaNs in log\n        yhat_inv = self.eta(yhat_inv)\n        loss = -1\/nsample * (np.sum(np.multiply(np.log(yhat), y) + np.multiply((y_inv), np.log(yhat_inv))))\n        return loss\n\n    def forward_propagation(self):\n        '''\n        Performs the forward propagation\n        '''\n\n        Z1 = self.X.dot(self.params['W1']) + self.params['b1']\n        A1 = self.relu(Z1)\n        Z2 = A1.dot(self.params['W2']) + self.params['b2']\n        yhat = self.sigmoid(Z2)\n        loss = self.entropy_loss(self.y,yhat)\n\n        # save calculated parameters\n        self.params['Z1'] = Z1\n        self.params['Z2'] = Z2\n        self.params['A1'] = A1\n\n        return yhat,loss\n\n    def back_propagation(self,yhat):\n        '''\n        Computes the derivatives and update weights and bias according.\n        '''\n        y_inv = 1 - self.y\n        yhat_inv = 1 - yhat\n\n        dl_wrt_yhat = np.divide(y_inv, self.eta(yhat_inv)) - np.divide(self.y, self.eta(yhat))\n        dl_wrt_sig = yhat * (yhat_inv)\n        dl_wrt_z2 = dl_wrt_yhat * dl_wrt_sig\n\n        dl_wrt_A1 = dl_wrt_z2.dot(self.params['W2'].T)\n        dl_wrt_w2 = self.params['A1'].T.dot(dl_wrt_z2)\n        dl_wrt_b2 = np.sum(dl_wrt_z2, axis=0, keepdims=True)\n\n        dl_wrt_z1 = dl_wrt_A1 * self.dRelu(self.params['Z1'])\n        dl_wrt_w1 = self.X.T.dot(dl_wrt_z1)\n        dl_wrt_b1 = np.sum(dl_wrt_z1, axis=0, keepdims=True)\n\n        #update the weights and bias\n        self.params['W1'] = self.params['W1'] - self.learning_rate * dl_wrt_w1\n        self.params['W2'] = self.params['W2'] - self.learning_rate * dl_wrt_w2\n        self.params['b1'] = self.params['b1'] - self.learning_rate * dl_wrt_b1\n        self.params['b2'] = self.params['b2'] - self.learning_rate * dl_wrt_b2\n\n    def fit(self, X, y):\n        '''\n        Trains the neural network using the specified data and labels\n        '''\n        self.X = X\n        self.y = y\n        self.init_weights() #initialize weights and bias\n\n\n        for i in range(self.iterations):\n            yhat, loss = self.forward_propagation()\n            self.back_propagation(yhat)\n            self.loss.append(loss)\n\n    def predict(self, X):\n        '''\n        Predicts on a test data\n        '''\n        Z1 = X.dot(self.params['W1']) + self.params['b1']\n        A1 = self.relu(Z1)\n        Z2 = A1.dot(self.params['W2']) + self.params['b2']\n        pred = self.sigmoid(Z2)\n        return np.round(pred)\n\n    def acc(self, y, yhat):\n        '''\n        Calculates the accutacy between the predicted valuea and the truth labels\n        '''\n        acc = int(sum(y == yhat) \/ len(y) * 100)\n        return acc\n\n\n    def plot_loss(self):\n        '''\n        Plots the loss curve\n        '''\n        plt.plot(self.loss)\n        plt.xlabel(\"Iteration\")\n        plt.ylabel(\"logloss\")\n        plt.title(\"Loss curve for training\")\n        plt.show()<\/pre>\n<p id=\"d2aa\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Congratulations, you now have a fully functional, 2-layer neural network for a binary classification task. You should give yourself a pat on the back. But before proclaiming complete victory, let\u2019s see if your network actually works.<\/p>\n<p id=\"8a42\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/heartbeat.comet.ml\/building-a-neural-network-from-scratch-using-python-part-2-testing-the-network-c1f0c1c9cbb0\" target=\"_blank\" rel=\"noopener ugc nofollow\">In the next post<\/a>, you\u2019ll make predictions, and also compare your network\u2019s predictions with popular deep learning libraries.<\/p>\n<p id=\"f729\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">If you have any questions, suggestions, or feedback, don\u2019t hesitate to use the comment section below.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg\" alt=\"\" width=\"640\" height=\"345\"><\/figure><div class=\"mq mr sz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/format:webp\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 1280w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/1*FCiTcgvIHxdfEUurUWMsfQ.jpeg 1280w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"d4bb\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">Connect with me on <\/em><a class=\"af mu\" href=\"https:\/\/twitter.com\/risingodegua\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be sb\"><em class=\"ry\">Twitter<\/em><\/strong><\/a><strong class=\"be sb\"><em class=\"ry\">.<\/em><\/strong><\/p>\n<p id=\"e600\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\"><em class=\"ry\">Connect with me on <\/em><a class=\"af mu\" href=\"https:\/\/www.linkedin.com\/in\/risingdeveloper\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be sb\"><em class=\"ry\">LinkedIn<\/em><\/strong><\/a><strong class=\"be sb\"><em class=\"ry\">.<\/em><\/strong><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence (AI) is a buzzword you see pretty much everywhere around you, even when you\u2019re not looking. It has completely dominated tech media, newsrooms, and is even credited with the success of many modern applications. But does it really work, or is it just hype? Truth is, it does. While there might be some [&hellip;]<\/p>\n","protected":false},"author":31,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6,7],"tags":[],"coauthors":[148],"class_list":["post-5945","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building a Neural Network From Scratch Using Python (Part 1) - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Neural Network From Scratch Using Python (Part 1)\" \/>\n<meta property=\"og:description\" content=\"Artificial intelligence (AI) is a buzzword you see pretty much everywhere around you, even when you\u2019re not looking. It has completely dominated tech media, newsrooms, and is even credited with the success of many modern applications. But does it really work, or is it just hype? Truth is, it does. While there might be some [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-14T16:02:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:15:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp\" \/>\n<meta name=\"author\" content=\"Rising Odegua\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rising Odegua\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"35 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Building a Neural Network From Scratch Using Python (Part 1) - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/","og_locale":"en_US","og_type":"article","og_title":"Building a Neural Network From Scratch Using Python (Part 1)","og_description":"Artificial intelligence (AI) is a buzzword you see pretty much everywhere around you, even when you\u2019re not looking. It has completely dominated tech media, newsrooms, and is even credited with the success of many modern applications. But does it really work, or is it just hype? Truth is, it does. While there might be some [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-06-14T16:02:48+00:00","article_modified_time":"2025-04-24T17:15:30+00:00","og_image":[{"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp","type":"","width":"","height":""}],"author":"Rising Odegua","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Rising Odegua","Est. reading time":"35 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/"},"author":{"name":"Rising Odegua","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/1767de6937de72ff04ad93b3052052a8"},"headline":"Building a Neural Network From Scratch Using Python (Part 1)","datePublished":"2023-06-14T16:02:48+00:00","dateModified":"2025-04-24T17:15:30+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/"},"wordCount":5668,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/","url":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/","name":"Building a Neural Network From Scratch Using Python (Part 1) - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp","datePublished":"2023-06-14T16:02:48+00:00","dateModified":"2025-04-24T17:15:30+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/06\/1NUil-wyDtAKmoD-HjWXBLw-1024x614.webp"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-neural-network-from-scratch-using-python-part-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Building a Neural Network From Scratch Using Python (Part 1)"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/1767de6937de72ff04ad93b3052052a8","name":"Rising Odegua","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/9fbdbbdb3b6eb0a0a854b1898b3e1544","url":"https:\/\/secure.gravatar.com\/avatar\/849e6783a937de45b52ff0d4237168a95de6942b2ebd3dd7457fb7ea09b93ad4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/849e6783a937de45b52ff0d4237168a95de6942b2ebd3dd7457fb7ea09b93ad4?s=96&d=mm&r=g","caption":"Rising Odegua"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/risingodegua\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5945","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/31"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=5945"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5945\/revisions"}],"predecessor-version":[{"id":15619,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5945\/revisions\/15619"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=5945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=5945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=5945"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=5945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}