{"id":7854,"date":"2023-10-06T14:16:44","date_gmt":"2023-10-06T22:16:44","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7854"},"modified":"2025-04-24T17:05:52","modified_gmt":"2025-04-24T17:05:52","slug":"introduction-to-deep-learning-with-keras","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/","title":{"rendered":"Introduction to Deep Learning with Keras"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\">\n\n\n\n<div class=\"fi fj fk fl fm\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ma mb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png\" alt=\"\" width=\"1500\" height=\"1000\"><\/figure><div class=\"lu bg\">\n<figure class=\"lv lw lx ly lz lu bg paragraph-image\"><picture><\/picture><\/figure>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg eu ev ew ex\">\n<p id=\"80f3\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">In this article, we\u2019ll build a simple neural network using <a class=\"af mz\" href=\"http:\/\/keras.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Keras<\/a>. We\u2019ll assume you have prior knowledge of machine learning packages such as <a class=\"af mz\" href=\"http:\/\/scikit-learn.org\/stable\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">scikit-learn<\/a>and other scientific packages such as <a class=\"af mz\" href=\"https:\/\/pandas.pydata.org\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Pandas<\/a> and <a class=\"af mz\" href=\"http:\/\/www.numpy.org\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Numpy<\/a>.<\/p>\n<h1 id=\"595b\" class=\"na nb fp be nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Training an Artificial Neural Network<\/strong><\/h1>\n<p id=\"4712\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">Training an artificial neural network involves the following steps:<\/p>\n<ol class=\"\">\n<li id=\"2d55\" class=\"mc md fp be b me mf mg mh mi mj mk ml mm od mo mp mq oe ms mt mu of mw mx my og oh oi bj\" data-selectable-paragraph=\"\">Weights are randomly initialized to numbers that are near zero but not zero.<\/li>\n<li id=\"8173\" class=\"mc md fp be b me oj mg mh mi ok mk ml mm ol mo mp mq om ms mt mu on mw mx my og oh oi bj\" data-selectable-paragraph=\"\">Feed the observations of your dataset to the input layer.<\/li>\n<li id=\"1ca3\" class=\"mc md fp be b me oj mg mh mi ok mk ml mm ol mo mp mq om ms mt mu on mw mx my og oh oi bj\" data-selectable-paragraph=\"\">Forward propagation (from left to right): neurons are activated and the predicted values are obtained.<\/li>\n<li id=\"0d41\" class=\"mc md fp be b me oj mg mh mi ok mk ml mm ol mo mp mq om ms mt mu on mw mx my og oh oi bj\" data-selectable-paragraph=\"\">Compare predicted results to actual values and measure the error.<\/li>\n<li id=\"2c79\" class=\"mc md fp be b me oj mg mh mi ok mk ml mm ol mo mp mq om ms mt mu on mw mx my og oh oi bj\" data-selectable-paragraph=\"\">Backward propagation (from right to left): weights are adjusted.<\/li>\n<li id=\"0262\" class=\"mc md fp be b me oj mg mh mi ok mk ml mm ol mo mp mq om ms mt mu on mw mx my og oh oi bj\" data-selectable-paragraph=\"\">Repeat steps 1\u20135<\/li>\n<li id=\"5a86\" class=\"mc md fp be b me oj mg mh mi ok mk ml mm ol mo mp mq om ms mt mu on mw mx my og oh oi bj\" data-selectable-paragraph=\"\">One epoch is achieved when the whole training set has gone through the neural network.<\/li>\n<\/ol>\n<h1 id=\"8aad\" class=\"na nb fp be nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Business Problem<\/strong><\/h1>\n<p id=\"78ad\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">Now let\u2019s proceed to solve a real business problem. An insurance company has approached you with a dataset of previous claims of their clients. The insurance company wants you to develop a model to help them predict which claims look fraudulent. By doing so you hope to save the company millions of dollars annually. This is a <a class=\"af mz\" href=\"https:\/\/medium.com\/r?url=https%3A%2F%2Fheartbeat.fritz.ai%2Fclassification-model-evaluation-90d743883106\" rel=\"noopener\">classification<\/a> problem. These are the columns in our dataset.<\/p>\n<figure class=\"or os ot ou ov lu oo op paragraph-image\">\n<div class=\"ow ox ec oy bg oz\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ma mb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*zHqO8ZJ0iqkpljd6.\" alt=\"\" width=\"700\" height=\"160\"><\/figure><div class=\"oo op oq\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*zHqO8ZJ0iqkpljd6. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*zHqO8ZJ0iqkpljd6. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*zHqO8ZJ0iqkpljd6. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*zHqO8ZJ0iqkpljd6. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*zHqO8ZJ0iqkpljd6. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*zHqO8ZJ0iqkpljd6. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*zHqO8ZJ0iqkpljd6. 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*zHqO8ZJ0iqkpljd6. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*zHqO8ZJ0iqkpljd6. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*zHqO8ZJ0iqkpljd6. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*zHqO8ZJ0iqkpljd6. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*zHqO8ZJ0iqkpljd6. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*zHqO8ZJ0iqkpljd6. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*zHqO8ZJ0iqkpljd6. 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<figure class=\"or os ot ou ov lu oo op paragraph-image\">\n<div class=\"ow ox ec oy bg oz\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ma mb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*oZu07tbbW5E7jcH9.\" alt=\"\" width=\"700\" height=\"48\"><\/figure><div class=\"oo op pa\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*oZu07tbbW5E7jcH9. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*oZu07tbbW5E7jcH9. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*oZu07tbbW5E7jcH9. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*oZu07tbbW5E7jcH9. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*oZu07tbbW5E7jcH9. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*oZu07tbbW5E7jcH9. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*oZu07tbbW5E7jcH9. 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*oZu07tbbW5E7jcH9. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*oZu07tbbW5E7jcH9. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*oZu07tbbW5E7jcH9. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*oZu07tbbW5E7jcH9. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*oZu07tbbW5E7jcH9. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*oZu07tbbW5E7jcH9. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*oZu07tbbW5E7jcH9. 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<figure class=\"or os ot ou ov lu oo op paragraph-image\">\n<div class=\"ow ox ec oy bg oz\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ma mb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*L9OI6Yq5MN2pPzKW.\" alt=\"\" width=\"700\" height=\"56\"><\/figure><div class=\"oo op pb\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*L9OI6Yq5MN2pPzKW. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*L9OI6Yq5MN2pPzKW. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*L9OI6Yq5MN2pPzKW. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*L9OI6Yq5MN2pPzKW. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*L9OI6Yq5MN2pPzKW. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*L9OI6Yq5MN2pPzKW. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*L9OI6Yq5MN2pPzKW. 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*L9OI6Yq5MN2pPzKW. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*L9OI6Yq5MN2pPzKW. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*L9OI6Yq5MN2pPzKW. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*L9OI6Yq5MN2pPzKW. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*L9OI6Yq5MN2pPzKW. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*L9OI6Yq5MN2pPzKW. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*L9OI6Yq5MN2pPzKW. 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h2 id=\"200d\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Data Preprocessing<\/strong><\/h2>\n<p id=\"4c16\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">As in many business problems, the data provided will not be processed for us. We therefore have to prepare it in a way that our algorithm will accept it. We see from the dataset that we have some categorical columns. We need to convert these to zeros and ones so that our deep learning model will be able to understand them. Another thing to note is that we have to feed our dataset to the model as numpy arrays. Below we import the necessary packages and then load in our <mark class=\"ads adt ao\"><a class=\"af mz\" href=\"https:\/\/github.com\/mwitiderrick\/insurancedata\/blob\/master\/insurance_claims.csv\" target=\"_blank\" rel=\"noopener ugc nofollow\">dataset<\/a><\/mark>.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"4fab\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">import pandas as pd<\/em><\/span><span id=\"91b4\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">import numpy as np<\/em><\/span><span id=\"43e7\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">df = pd.read_csv(\u2018Datasets\/claims\/insurance_claims.csv\u2019)<\/em><\/span><\/pre>\n<p id=\"2450\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We then to convert the categorical columns to dummy variables.<\/p>\n<pre>feats = [\u2018policy_state\u2019,\u2019insured_sex\u2019,\u2019insured_education_level\u2019,\u2019insured_occupation\u2019,\u2019insured_hobbies\u2019,\u2019insured_relationship\u2019,\u2019collision_type\u2019,\u2019incident_severity\u2019,\u2019authorities_contacted\u2019,\u2019incident_state\u2019,\u2019incident_city\u2019,\u2019incident_location\u2019,\u2019property_damage\u2019,\u2019police_report_available\u2019,\u2019auto_make\u2019,\u2019auto_model\u2019,\u2019fraud_reported\u2019,\u2019incident_type\u2019]\ndf_final = pd.get_dummies(df,columns=feats,drop_first=True)<\/pre>\n<p id=\"b60b\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">In this case we use <code class=\"cw qg qh qi pu b\">drop_first=True<\/code> to avoid the dummy variable trap. For example, if you have a, b, c, d as categories then you can drop d as a dummy variable. This is because if something does not fall into either a, b, or c then it\u2019s definitely in d. This is referred to as multicollinearity.<\/p>\n<p id=\"ed21\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We use <a class=\"af mz\" href=\"http:\/\/scikit-learn.org\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">sklearn\u2019s<\/a> <code class=\"cw qg qh qi pu b\">train_test_split<\/code> to split the data into a training set and a test set.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"ade3\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from sklearn.model_selection import train_test_split<\/em><\/span><\/pre>\n<p id=\"b204\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Next we make sure to drop the column we\u2019re predicting to prevent it from leaking into the training set and the test set. We must avoid using the same dataset to train and test the model. We set <code class=\"cw qg qh qi pu b\">.values<\/code> at the end of the dataset in order to get the numpy arrays. This is the way our deep learning model will accept the data. This step is important because our machine learning model expects the data in form of arrays.<\/p>\n<pre>X = df_final.drop([\u2018fraud_reported_Y\u2019,\u2019policy_csl\u2019,\u2019policy_bind_date\u2019,\u2019incident_date\u2019],axis=1).values\ny = df_final[\u2018fraud_reported_Y\u2019].values<\/pre>\n<p id=\"1cbf\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We then split the data into a training and test set. We use 0.7 of the data for training and 0.3 for testing.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"46c2\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)<\/em><\/span><\/pre>\n<p id=\"918b\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Next we have to scale our dataset using Sklearn\u2019s <code class=\"cw qg qh qi pu b\">StandardScaler<\/code><em class=\"qb\">. <\/em>Due to the massive amounts of computations taking place in deep learning, feature scaling is compulsory. Feature scaling standardizes the range of our independent variables.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"e943\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from sklearn.preprocessing import StandardScaler<\/em><\/span><span id=\"abfb\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">sc = StandardScaler()<\/em><\/span><span id=\"2155\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">X_train = sc.fit_transform(X_train)<\/em><\/span><span id=\"7a28\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">X_test = sc.transform(X_test)<\/em><\/span><\/pre>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fi fj fk fl fm\">\n<div class=\"ab ca\">\n<div class=\"ch bg eu ev ew ex\">\n<h2 id=\"4424\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Building the Artificial Neural Network(ANN)<\/strong><\/h2>\n<p id=\"e32a\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">The first thing we need to do is import Keras. By default, Keras will use <a class=\"af mz\" href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">TensorFlow<\/a> as its backend.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"9213\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">import keras<\/em><\/span><\/pre>\n<p id=\"c704\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Next we need to import a few modules from Keras. The Sequential module is required to initialize the ANN, and the Dense module is required to build the layers of our ANN.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"63ec\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from keras.models import Sequential<\/em><\/span><span id=\"5f2b\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from keras.layers import Dense<\/em><\/span><\/pre>\n<p id=\"7280\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Next we need to initialize our ANN by creating an instance of Sequential. The Sequential function initializes a linear stack of layers. This allows us to add more layers later using the Dense module.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"033c\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classifier = Sequential()<\/em><\/span><\/pre>\n<h2 id=\"a7a6\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\">Adding input layer (First Hidden Layer)<\/h2>\n<p id=\"11ca\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">We use the add method to add different layers to our ANN. The first parameter is the number of nodes you want to add to this layer. There is no rule of thumb as to how many nodes you should add. However, a common strategy is to choose the number of nodes as the average of nodes in the input layer and the number of nodes in the output layer.<\/p>\n<p id=\"b179\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Say for example you had five independent variables and one output. Then you would take the sum of that and divide by two, which is three. You can also decide to experiment with a technique called parameter tuning. The second parameter, <code class=\"cw qg qh qi pu b\">kernel_initializer<\/code>, is the function that will be used to initialize the weights.<\/p>\n<p id=\"94c1\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">In this case, it will use a uniform distribution to make sure that the weights are small numbers close to zero. The next parameter is the activation function. We use the <a class=\"af mz\" href=\"https:\/\/www.kaggle.com\/dansbecker\/rectified-linear-units-relu-in-deep-learning\" target=\"_blank\" rel=\"noopener ugc nofollow\">Rectifier function<\/a>, shortened as ReLU. We mostly use this function for the hidden layer in ANN. The final parameter is <code class=\"cw qg qh qi pu b\">input_dim<\/code>, which is the number of nodes in the input layer. It represents the number of independent variables.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"6789\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classiifier.add(\n        Dense(3, kernel_initializer = \u2018uniform\u2019,\n              activation = \u2018relu\u2019, input_dim=5))<\/em><\/span><\/pre>\n<h2 id=\"37b3\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Adding Second Hidden Layer<\/strong><\/h2>\n<p id=\"5fb7\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">Adding the second hidden layer is similar to adding the first hidden layer.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"8639\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classiifier.add(\n      Dense(3, kernel_initializer = \u2018uniform\u2019,\n            activation = \u2018relu\u2019))<\/em><\/span><\/pre>\n<p id=\"2b6e\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We don\u2019t need to specify the <code class=\"cw qg qh qi pu b\">input_dim<\/code> parameter because we have already specified it in the first hidden layer. In the first hidden layer, we specified this in order to let the layer know how many input nodes to expect. In the second hidden layer the ANN already knows how many input nodes to expect so we don\u2019t need to repeat ourselves.<\/p>\n<h2 id=\"2008\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Adding the output layer<\/strong><\/h2>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"a0b4\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classifier.add(\n     Dense(1, kernel_initializer = \u2018uniform\u2019,\n           activation = \u2018sigmoid\u2019))<\/em><\/span><\/pre>\n<p id=\"b2f1\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We change the first parameter because in our output node we expect one node. This is because we are only interested in knowing whether a claim was fraudulent or not. We change the activation function because we want to get the probabilities that a claim is fraudulent. We do this by using the <a class=\"af mz\" href=\"https:\/\/towardsdatascience.com\/activation-functions-neural-networks-1cbd9f8d91d6\" target=\"_blank\" rel=\"noopener\">Sigmoid activation function<\/a>.<\/p>\n<p id=\"8c7e\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">In case you\u2019re dealing with a classification problem that has more than two classes (i.e. classifying cats, dogs, and monkeys) we\u2019d need to change two things. We\u2019d change the first parameter to 3 and change the activation function to softmax. Softmax is a sigmoid function applied to an independent variable with more than two categories.<\/p>\n<h2 id=\"171a\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Compiling the ANN<\/strong><\/h2>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"5a92\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classifier.compile(optimizer= \u2018adam\u2019,\n                  loss = \u2018binary_crossentropy\u2019,\n                  metrics = [\u2018accuracy\u2019])<\/em><\/span><\/pre>\n<p id=\"65d3\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Compiling is basically applying a <a class=\"af mz\" href=\"https:\/\/towardsdatascience.com\/https-medium-com-reina-wang-tw-stochastic-gradient-descent-with-restarts-5f511975163\" target=\"_blank\" rel=\"noopener\">stochastic gradient descent<\/a> to the whole neural network. The first parameter is the algorithm you want to use to get the optimal set of weights in the neural network. The algorithm used here is a stochastic gradient algorithm.<\/p>\n<p id=\"8c16\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">There are many variants of this. A very efficient one to use is <a class=\"af mz\" href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Adam<\/a>. The second parameter is the loss function within the stochastic gradient algorithm. Since our categories are binary, we use the <code class=\"cw qg qh qi pu b\">binary_crossentropy<\/code>loss function. Otherwise we would have used <code class=\"cw qg qh qi pu b\">categorical_crossentopy<\/code><em class=\"qb\">. <\/em>The final argument is the criterion we\u2019ll use to evaluate our model. In this case we use the accuracy.<\/p>\n<h2 id=\"7ff1\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Fitting our ANN to the training set<\/strong><\/h2>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"e8f7\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)<\/em><\/span><\/pre>\n<p id=\"729c\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\"><code class=\"cw qg qh qi pu b\">X_train<\/code> represents the independent variables we\u2019re using to train our ANN, and <code class=\"cw qg qh qi pu b\">y_train<\/code> represents the column we\u2019re predicting. Epochs represents the number of times we\u2019re going to pass our full dataset through the ANN. <code class=\"cw qg qh qi pu b\">Batch_size<\/code> is the number of observations after which the weights will be updated.<\/p>\n<h2 id=\"3cbb\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Predicting using the training set<\/strong><\/h2>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"306b\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">y_pred = classifier.predict(X_test)<\/em><\/span><\/pre>\n<p id=\"2b8d\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">This will show us the probability of a claim being fraudulent. We then set a threshold of 50% for classifying a claim as fraudulent. This means that any claim with a probability of 0.5 or more will be classified as fraudulent.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"b4bb\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">y_pred = (y_pred &gt; 0.5)<\/em><\/span><\/pre>\n<p id=\"4a3f\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">This way the insurance firm can be able to first track claims that are not suspicious and then take more time evaluating claims flagged as fraudulent.<\/p>\n<h2 id=\"250c\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Checking the confusion matrix<\/strong><\/h2>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"d18c\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from sklearn.metrics import confusion_matrix<\/em><\/span><span id=\"ad94\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">cm = confusion_matrix(y_test, y_pred)<\/em><\/span><\/pre>\n<figure class=\"or os ot ou ov lu oo op paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ma mb c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:324\/0*_E1BuRJ4mx2HVq5n.\" alt=\"\" width=\"324\" height=\"163\"><\/figure><div class=\"oo op rb\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*_E1BuRJ4mx2HVq5n. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*_E1BuRJ4mx2HVq5n. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*_E1BuRJ4mx2HVq5n. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*_E1BuRJ4mx2HVq5n. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*_E1BuRJ4mx2HVq5n. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*_E1BuRJ4mx2HVq5n. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:648\/0*_E1BuRJ4mx2HVq5n. 648w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 324px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*_E1BuRJ4mx2HVq5n. 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*_E1BuRJ4mx2HVq5n. 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*_E1BuRJ4mx2HVq5n. 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*_E1BuRJ4mx2HVq5n. 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*_E1BuRJ4mx2HVq5n. 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*_E1BuRJ4mx2HVq5n. 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:648\/0*_E1BuRJ4mx2HVq5n. 648w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 324px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"aa85\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">The confusion matrix can be interpreted as follows. Out of 2000 observations, 1550 + 175 observations were correctly predicted while 230 + 45 were incorrectly predicted. You can calculate the accuracy by dividing the number of correct predictions by the total number of predictions. In this case (1550+175) \/ 2000, which gives you 86%.<\/p>\n<h2 id=\"2378\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Making a single Prediction<\/strong><\/h2>\n<p id=\"7582\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">Let\u2019s say the insurance company gives you a single claim. They\u2019d like to know if the claim is fraudulent. What would you do to find out?<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"7bba\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">new_pred = classifier.predict(sc.transform(np.array([[a,b,c,d]])))<\/em><\/span><\/pre>\n<p id=\"abae\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">where a,b,c,d represents the features you have.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"235a\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">new_pred = (new_prediction &gt; 0.5)<\/em><\/span><\/pre>\n<p id=\"0fa1\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Since our classifier expects numpy arrays, we have to transform the single observation into a numpy array and use the standard scaler to scale it.<\/p>\n<h2 id=\"26c5\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Evaluating our ANN<\/strong><\/h2>\n<p id=\"9f2a\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">After training the model one or two times, you\u2019ll notice that you keep getting different accuracies. So you\u2019re not quite sure which one is the right one. This introduces the bias variance trade off. In essence, we\u2019re trying to train a model that will be accurate and not have too much variance of accuracy when trained several times.<\/p>\n<p id=\"736c\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">To solve this problem we use the K-fold cross validation with K equal to 10. This will split the training set into 10 folds. We\u2019ll then train our model on 9 folds and test it on the remaining fold. Since we have 10 folds, we\u2019re going to do this iteratively through 10 combinations. Each iteration will gives us its accuracy. We\u2019ll then find the mean of all accuracies and use that as our model accuracy. We also calculate the variance to ensure that it\u2019s minimal.<\/p>\n<p id=\"1b5d\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Keras has a scikit learn wrapper (<code class=\"cw qg qh qi pu b\">KerasClassifier<\/code>) that enables us to include K-fold cross validation in our Keras code.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"13b3\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from keras.wrappers.scikit_learn import KerasClassifier<\/em><\/span><\/pre>\n<p id=\"be1b\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Next we import the k-fold cross validation function from scikit_learn<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"c583\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from sklearn.model_selection import cross_val_score<\/em><\/span><\/pre>\n<p id=\"2dd9\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">The <code class=\"cw qg qh qi pu b\">KerasClassifier<\/code> expects one of its arguments to be a function, so we need to build that function.The purpose of this function is to build the architecture of our ANN.<\/p>\n<pre>def make_classifier():\n    classifier = Sequential()\n    classiifier.add(Dense(3, kernel_initializer = \u2018uniform\u2019, activation = \u2018relu\u2019, input_dim=5))\n    classiifier.add(Dense(3, kernel_initializer = \u2018uniform\u2019, activation = \u2018relu\u2019))\n    classifier.add(Dense(1, kernel_initializer = \u2018uniform\u2019, activation = \u2018sigmoid\u2019))\n    classifier.compile(optimizer= \u2018adam\u2019,loss = \u2018binary_crossentropy\u2019,metrics = [\u2018accuracy\u2019])\n    return classifier<\/pre>\n<p id=\"a245\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">This function will build the classifier and return it for use in the next step. The only thing we have done here is wrap our previous ANN architecture in a function and return the classifier.<\/p>\n<p id=\"5699\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We then create a new classifier using K-fold cross validation and pass the parameter <code class=\"cw qg qh qi pu b\">build_fn<\/code> as the function we just created above. Next we pass the batch size and the number of epochs, just like we did in the previous classifier.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"234b\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classiifier = KerasClassifier(build_fn = make_classifier,\n                            batch_size=10, nb_epoch=100)<\/em><\/span><\/pre>\n<p id=\"5d25\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">To apply the k-fold cross validation function we can use scikit-learn\u2019s <code class=\"cw qg qh qi pu b\">cross_val_score<\/code> function. The estimator is the classifier we just built with <code class=\"cw qg qh qi pu b\">make_classifier<\/code> and <code class=\"cw qg qh qi pu b\">n_jobs=-1<\/code> will make use of all available CPUs. cv is the number of folds and 10 is a typical choice. The <code class=\"cw qg qh qi pu b\">cross_val_score<\/code> will return the ten accuracies of the ten test folds used in the computation.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"68e7\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">accuracies = cross_val_score(estimator = classifier,\n                             X = X_train,\n                             y = y_train,\n                             cv = 10,\n                             n_jobs = -1)<\/em><\/span><\/pre>\n<p id=\"2e45\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">To obtain the relative accuracies we get the mean of the accuracies.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"8f61\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">mean = accuracies.mean()<\/em><\/span><\/pre>\n<p id=\"4979\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">The variance can be obtained as follows:<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"4dd0\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">variance = accuracies.var()<\/em><\/span><\/pre>\n<p id=\"9866\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">The goal is to have a small variance between the accuracies.<\/p>\n<h2 id=\"069e\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Fighting Overfitting<\/strong><\/h2>\n<p id=\"40ec\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/machinelearningmastery.com\/overfitting-and-underfitting-with-machine-learning-algorithms\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Overfitting<\/a> in machine learning is what happens when a model learns the details and noise in the training set such that it performs poorly on the test set. This can be observed when we have huge differences between the accuracies of the test set and training set, or when you observe a high variance when applying k-fold cross validation.<\/p>\n<p id=\"ed3a\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">In artificial neural networks, we counteract this using a technique called <a class=\"af mz\" href=\"https:\/\/machinelearningmastery.com\/dropout-regularization-deep-learning-models-keras\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">dropout regularization<\/a>. Dropout regularization works by randomly disabling some neurons at each iteration of the training to prevent them from being too dependent on each other.<\/p>\n<pre>from keras.layers import Dropout\n\nclassifier = Sequential()\nclassiifier.add(Dense(3, kernel_initializer = \u2018uniform\u2019, activation = \u2018relu\u2019, input_dim=5))\n\n# Notice the dropouts\nclassifier.add(Dropout(rate = 0.1))\nclassiifier.add(Dense(6, kernel_initializer = \u2018uniform\u2019, activation = \u2018relu\u2019))\nclassifier.add(Dropout(rate = 0.1))\n\nclassifier.add(Dense(1, kernel_initializer = \u2018uniform\u2019, activation = \u2018sigmoid\u2019))\nclassifier.compile(optimizer= \u2018adam\u2019,loss = \u2018binary_crossentropy\u2019,metrics = [\u2018accuracy\u2019])<\/pre>\n<p id=\"0059\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">In this case we apply the dropout after the first hidden layer and after the second hidden layer. Using a rate of 0.1 means that 1% of the neurons will be disabled at each iteration. It is advisable to start with a rate of 0.1. However you should never go beyond 0.4 because you will now start to get underfitting.<\/p>\n<h2 id=\"d675\" class=\"pc nb fp be nc pd pe pf ng pg ph pi nk mm pj pk pl mq pm pn po mu pp pq pr ps bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Parameter Tuning<\/strong><\/h2>\n<p id=\"435b\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">Once you obtain your accuracy you can tune the parameters to get a higher accuracy. Grid Search enables us to test different parameters in order to obtain the best parameters.<\/p>\n<p id=\"2620\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">The first step here is to import the <code class=\"cw qg qh qi pu b\">GridSearchCV<\/code> module from sklearn.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"b75c\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">from sklearn.model_selection import GridSearchCV<\/em><\/span><\/pre>\n<p id=\"97b7\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We also need to modify our <code class=\"cw qg qh qi pu b\">make_classifier<\/code> function as follows. We create a new variable called <code class=\"cw qg qh qi pu b\"><em class=\"qb\">optimizer<\/em><\/code> that will allow us to add more than one optimizer in our params variable.<\/p>\n<pre>def make_classifier(optimizer):\n    classifier = Sequential()\n    classiifier.add(Dense(6, kernel_initializer = \u2018uniform\u2019, activation = \u2018relu\u2019, input_dim=11))\n    classiifier.add(Dense(6, kernel_initializer = \u2018uniform\u2019, activation = \u2018relu\u2019))\n    classifier.add(Dense(1, kernel_initializer = \u2018uniform\u2019, activation = \u2018sigmoid\u2019))\n    classifier.compile(optimizer= optimizer,loss = \u2018binary_crossentropy\u2019,metrics = [\u2018accuracy\u2019])\n    return classifier<\/pre>\n<p id=\"a970\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We\u2019ll still use the <code class=\"cw qg qh qi pu b\">KerasClassifier<\/code>, but we won\u2019t pass the batch size and number of epochs since these are the parameters we want to tune.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"f752\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">classifier = KerasClassifier(build_fn = make_classifier)<\/em><\/span><\/pre>\n<p id=\"4b29\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">The next step is to create a dictionary with the parameters we\u2019d like to tune \u2014 in this case the batch size, the number of epochs, and the optimizer function. We still use Adam as an optimizer and add a new one called <a class=\"af mz\" href=\"http:\/\/keras.io\/optimizers\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">rmsprop<\/a>. The Keras documentation recommends rmsprop when dealing with <a class=\"af mz\" href=\"https:\/\/heartbeat.fritz.ai\/detecting-the-language-of-a-persons-name-using-pytorch-rnn-29a9090c20f2\" target=\"_blank\" rel=\"noopener ugc nofollow\">Recurrent Neural Networks<\/a>. However we can try it for this ANN to see if it gives us a better result.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"e617\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">params = {\n    'batch_size':[20,35],<\/em><\/span><span id=\"ffb2\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">    'nb_epoch':[150,500],<\/em><\/span><span id=\"b6c8\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><strong class=\"pu fq\"><em class=\"qb\">    'Optimizer':['adam','rmsprop'<\/em><\/strong><em class=\"qb\">]\n}<\/em><\/span><\/pre>\n<p id=\"a382\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We then use Grid Search to test these parameters. The grid search function expects our estimator, the parameters we just defined, the scoring metric and the number of k-folds.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"47d0\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">grid_search = GridSearchCV(estimator=classifier,\n                           param_grid=params,\n                           scoring=\u2019accuracy\u2019,\n                           cv=10)<\/em><\/span><\/pre>\n<p id=\"ae29\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">Like in previous objects we need to fit our training set.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"c3ff\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">grid_search = grid_search.fit(X_train,y_train)<\/em><\/span><\/pre>\n<p id=\"d35b\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">We can get the best selection of parameters using <code class=\"cw qg qh qi pu b\">best_params<\/code> from the grid search object. Likewise we use the <code class=\"cw qg qh qi pu b\">best_score_<\/code> to get the best score.<\/p>\n<pre class=\"or os ot ou ov pt pu pv pw ax px bj\"><span id=\"a1ae\" class=\"pc nb fp pu b ho py pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">best_param = grid_search.best_params_<\/em><\/span><span id=\"8d0a\" class=\"pc nb fp pu b ho qc pz l ie qa\" data-selectable-paragraph=\"\"><em class=\"qb\">best_accuracy = grid_search.best_score_<\/em><\/span><\/pre>\n<p id=\"d97c\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">It\u2019s important to note that this process will take a while as it searches for the best parameters.<\/p>\n<h1 id=\"3b77\" class=\"na nb fp be nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Conclusion<\/strong><\/h1>\n<p id=\"b737\" class=\"pw-post-body-paragraph mc md fp be b me ny mg mh mi nz mk ml mm oa mo mp mq ob ms mt mu oc mw mx my fi bj\" data-selectable-paragraph=\"\">Artificial Neural Networks are just one type of deep neural network. There are other networks such Recurrent Neural Networks (RNN), <a class=\"af mz\" href=\"https:\/\/heartbeat.comet.ml\/a-beginners-guide-to-convolutional-neural-networks-cnn-cf26c5ee17ed\" target=\"_blank\" rel=\"noopener ugc nofollow\">Convolutional Neural Networks<\/a>(CNN), and <a class=\"af mz\" href=\"https:\/\/heartbeat.comet.ml\/guide-to-restricted-boltzmann-machines-using-pytorch-ee50d1ed21a8\" target=\"_blank\" rel=\"noopener ugc nofollow\">Boltzmann machines<\/a>.<\/p>\n<p id=\"167e\" class=\"pw-post-body-paragraph mc md fp be b me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my fi bj\" data-selectable-paragraph=\"\">RNNs can predict if the <a class=\"af mz\" href=\"https:\/\/heartbeat.comet.ml\/using-a-keras-long-shortterm-memory-lstm-model-to-predict-stock-prices-a08c9f69aa74\" target=\"_blank\" rel=\"noopener ugc nofollow\">price of a stock<\/a> will go up or down in the future. CNNs are used in <a class=\"af mz\" href=\"https:\/\/heartbeat.comet.ml\/the-5-computer-vision-techniques-that-will-change-how-you-see-the-world-1ee19334354b\" target=\"_blank\" rel=\"noopener ugc nofollow\">computer vision<\/a> \u2014 recognizing cats and dogs in a set of images or recognizing the presence of cancer cells in a brain image. Boltzmann machines are used in programming <a class=\"af mz\" href=\"https:\/\/medium.com\/r?url=https%3A%2F%2Fheartbeat.fritz.ai%2Frecommendation-systems-models-and-evaluation-84944a84fb8e\" rel=\"noopener\">recommender systems<\/a>. Maybe we can cover one of these neural networks in the future.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we\u2019ll build a simple neural network using Keras. We\u2019ll assume you have prior knowledge of machine learning packages such as scikit-learnand other scientific packages such as Pandas and Numpy. Training an Artificial Neural Network Training an artificial neural network involves the following steps: Weights are randomly initialized to numbers that are near [&hellip;]<\/p>\n","protected":false},"author":63,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6,7],"tags":[],"coauthors":[163],"class_list":["post-7854","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Introduction to Deep Learning with Keras - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introduction to Deep Learning with Keras\" \/>\n<meta property=\"og:description\" content=\"In this article, we\u2019ll build a simple neural network using Keras. We\u2019ll assume you have prior knowledge of machine learning packages such as scikit-learnand other scientific packages such as Pandas and Numpy. Training an Artificial Neural Network Training an artificial neural network involves the following steps: Weights are randomly initialized to numbers that are near [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-06T22:16:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:05:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png\" \/>\n<meta name=\"author\" content=\"Derrick Mwiti\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Derrick Mwiti\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Introduction to Deep Learning with Keras - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/","og_locale":"en_US","og_type":"article","og_title":"Introduction to Deep Learning with Keras","og_description":"In this article, we\u2019ll build a simple neural network using Keras. We\u2019ll assume you have prior knowledge of machine learning packages such as scikit-learnand other scientific packages such as Pandas and Numpy. Training an Artificial Neural Network Training an artificial neural network involves the following steps: Weights are randomly initialized to numbers that are near [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-10-06T22:16:44+00:00","article_modified_time":"2025-04-24T17:05:52+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png","type":"","width":"","height":""}],"author":"Derrick Mwiti","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Derrick Mwiti","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/"},"author":{"name":"Derrick Mwiti","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6"},"headline":"Introduction to Deep Learning with Keras","datePublished":"2023-10-06T22:16:44+00:00","dateModified":"2025-04-24T17:05:52+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/"},"wordCount":2204,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/","url":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/","name":"Introduction to Deep Learning with Keras - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png","datePublished":"2023-10-06T22:16:44+00:00","dateModified":"2025-04-24T17:05:52+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:1500\/1*Y_329i21ja4m0RPskuwUKw.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/introduction-to-deep-learning-with-keras\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Introduction to Deep Learning with Keras"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6","name":"Derrick Mwiti","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/b7db96aa11f77239bbde5eb79ede1493","url":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","caption":"Derrick Mwiti"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/mwitiderrickgmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7854","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7854"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7854\/revisions"}],"predecessor-version":[{"id":15512,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7854\/revisions\/15512"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7854"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7854"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7854"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7854"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}