{"id":9408,"date":"2024-03-05T06:00:43","date_gmt":"2024-03-05T14:00:43","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=9408"},"modified":"2025-04-24T17:03:02","modified_gmt":"2025-04-24T17:03:02","slug":"pima-indian-diabetes-prediction","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction\/","title":{"rendered":"Pima Indian Diabetes Prediction"},"content":{"rendered":"\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">This project aims to analyze the medical factors of a patient, such as Glucose Level, Blood Pressure, Skin Thickness, Insulin Level, and many others, to predict whether the patient has diabetes or not.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">About the&nbsp;Dataset<\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes based on specific diagnostic measurements included in the dataset. Several constraints were placed on selecting these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The dataset consists of several medical predictor variables and one target variable, Outcome. Predictor variables include the number of pregnancies the patient has had, their BMI, insulin level, age, etc.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Data Dictionary<\/h4>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*dTNqkG2oNo_7VsZLfPWY9g.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 1: Data Dictionary<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Importing the libraries<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n<span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np\n<span class=\"hljs-keyword\">import<\/span> matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\n<span class=\"hljs-keyword\">import<\/span> seaborn <span class=\"hljs-keyword\">as<\/span> sns<\/span><\/pre>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Loading the dataset<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">df = pd.read_csv(<span class=\"hljs-string\">\"diabetes.csv\"<\/span>)\ndf.head()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*Dbr3jhXPViQ5GmhdcunTcQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 2:&nbsp;Dataset<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Data Preprocessing<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Checking the shape of the dataset<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">df.shape\n=&gt; (<span class=\"hljs-number\">768<\/span>, <span class=\"hljs-number\">9<\/span>)<\/span><\/pre>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Unique values in the dataset<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">variables = [<span class=\"hljs-string\">'Pregnancies'<\/span>,<span class=\"hljs-string\">'Glucose'<\/span>,<span class=\"hljs-string\">'BloodPressure'<\/span>,<span class=\"hljs-string\">'SkinThickness'<\/span>,<span class=\"hljs-string\">'Insulin'<\/span>,<span class=\"hljs-string\">'BMI'<\/span>,<span class=\"hljs-string\">'DiabetesPedigreeFunction'<\/span>,<span class=\"hljs-string\">'Age'<\/span>,<span class=\"hljs-string\">'Outcome'<\/span>]\n<span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> variables:\n    <span class=\"hljs-built_in\">print<\/span>(df[i].unique())<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*UDW-nl3Rz0CmZmIkJBJhlw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 3: Unique&nbsp;Values<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In the dataset, the variables except Pregnancies and Outcome cannot have a value of 0 because it is impossible to have 0 Glucose Levels or 0 Blood Pressure. So, this will be counted as incorrect information.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Checking the count of value 0 in the variables<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">variables = [<span class=\"hljs-string\">'Glucose'<\/span>,<span class=\"hljs-string\">'BloodPressure'<\/span>,<span class=\"hljs-string\">'SkinThickness'<\/span>,<span class=\"hljs-string\">'Insulin'<\/span>,<span class=\"hljs-string\">'BMI'<\/span>,<span class=\"hljs-string\">'DiabetesPedigreeFunction'<\/span>,<span class=\"hljs-string\">'Age'<\/span>,]\n<span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> variables:\n    c = <span class=\"hljs-number\">0<\/span>\n    <span class=\"hljs-keyword\">for<\/span> x <span class=\"hljs-keyword\">in<\/span> (df[i]):\n        <span class=\"hljs-keyword\">if<\/span> x == <span class=\"hljs-number\">0<\/span>:\n            c = c + <span class=\"hljs-number\">1<\/span>\n    <span class=\"hljs-built_in\">print<\/span>(i,c)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*0djzhgtH8SX7N9SDcgZ_TQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 4: Zero value&nbsp;count<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Now that I have a count of incorrect values in the variables, I will be replacing these values.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#replacing the missing values with the mean<\/span>\nvariables = [<span class=\"hljs-string\">'Glucose'<\/span>,<span class=\"hljs-string\">'BloodPressure'<\/span>,<span class=\"hljs-string\">'SkinThickness'<\/span>,<span class=\"hljs-string\">'Insulin'<\/span>,<span class=\"hljs-string\">'BMI'<\/span>]\n<span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> variables:\n    df[i].replace(<span class=\"hljs-number\">0<\/span>,df[i].mean(),inplace=<span class=\"hljs-literal\">True<\/span>)<\/span><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#checking to make sure that incorrect values are replace<\/span>\n<span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> variables:\n    c = <span class=\"hljs-number\">0<\/span>\n    <span class=\"hljs-keyword\">for<\/span> x <span class=\"hljs-keyword\">in<\/span> (df[i]):\n        <span class=\"hljs-keyword\">if<\/span> x == <span class=\"hljs-number\">0<\/span>:\n            c = c + <span class=\"hljs-number\">1<\/span>\n    <span class=\"hljs-built_in\">print<\/span>(i,c)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*kMgB40sbxNqXNHyxNQWjbg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 5: Zero values&nbsp;replaced<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Checking for missing values<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">df.info()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*69fYToveTy0dEMe4UeGiLQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 6: Null&nbsp;values<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Descriptive Statistics<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">df.describe()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*NMASTce-X9A4M0_ndS1Ezw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 7: Descriptive Statistics<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">df.head()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*s4bFkur6DFOoIfNyFZ5Bqg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 8:&nbsp;Dataset<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Exploratory Data&nbsp;Analysis<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In the exploratory data analysis, I will look at the data distribution, the correlation between the features, and the relationship between the features and the target variable. I will start by looking at the data distribution, followed by the relationship between the target variable and independent variables.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Diabetes Count<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">plt.figure(figsize=(<span class=\"hljs-number\">5<\/span>,<span class=\"hljs-number\">5<\/span>))\nplt.pie(df[<span class=\"hljs-string\">'Outcome'<\/span>].value_counts(), labels=[<span class=\"hljs-string\">'No Diabetes'<\/span>, <span class=\"hljs-string\">'Diabetes'<\/span>], autopct=<span class=\"hljs-string\">'%1.1f%%'<\/span>, shadow=<span class=\"hljs-literal\">False<\/span>, startangle=<span class=\"hljs-number\">90<\/span>)\nplt.title(<span class=\"hljs-string\">'Diabetes Outcome'<\/span>)\nplt.show()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*PpahkXg5XMYWW7KZJduPAQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 9: Diabetes&nbsp;count<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Age Distribution and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">sns.catplot(x=<span class=\"hljs-string\">\"Outcome\"<\/span>, y=<span class=\"hljs-string\">\"Age\"<\/span>, kind=<span class=\"hljs-string\">\"swarm\"<\/span>, data=df)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*mebJQd8LxiEE7sLZOscL1Q.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 10: Age and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">From the graph, it is quite clear that most patients are adults aged 20\u201330 years. Patients in the age range 40\u201355 years are more prone to diabetes, as compared to other age groups. Since the number of adults in the age group 20\u201330 years is greater, the number of patients with diabetes is also more as compared to other age groups.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Pregnancies and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">fig,ax = plt.subplots(<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>,figsize=(<span class=\"hljs-number\">15<\/span>,<span class=\"hljs-number\">5<\/span>))\nsns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'Pregnancies'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">0<\/span>])\nsns.violinplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'Pregnancies'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">1<\/span>])<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*jQRUVXUM_ZnWVxOJvrvZrw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 11: Pregnancies and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The boxplot and violin plot show a strange relationship between the number of pregnancies and diabetes. According to the graphs, the increased number of pregnancies highlights an increased risk of diabetes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Glucose and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">sns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>, y=<span class=\"hljs-string\">'Glucose'<\/span>, data=df).set_title(<span class=\"hljs-string\">'Glucose vs Diabetes'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*jSf1YBad9_4L2RoF1plsGw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 12: Glucose and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Glucose level plays a significant role in determining whether the patient has diabetes. Patients with a median glucose level of less than 120 are more likely to be nondiabetic. Patients with a median glucose level greater than 140 are more likely to be diabetic. Therefore, high glucose levels are a good indicator of diabetes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Blood Pressure and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">fig,ax = plt.subplots(<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>,figsize=(<span class=\"hljs-number\">15<\/span>,<span class=\"hljs-number\">5<\/span>))\nsns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>, y=<span class=\"hljs-string\">'BloodPressure'<\/span>, data=df, ax=ax[<span class=\"hljs-number\">0<\/span>]).set_title(<span class=\"hljs-string\">'BloodPressure vs Diabetes'<\/span>)\nsns.violinplot(x=<span class=\"hljs-string\">'Outcome'<\/span>, y=<span class=\"hljs-string\">'BloodPressure'<\/span>, data=df, ax=ax[<span class=\"hljs-number\">1<\/span>]).set_title(<span class=\"hljs-string\">'BloodPressure vs Diabetes'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*l1WVDDe3aAhmNu2PD-2aBQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 13: Blood Pressure and&nbsp;Glucose<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Both the boxplot and violin plot provide a clear understanding of the relationship between blood pressure and diabetes. The boxplot shows that the median blood pressure for diabetic patients is slightly higher than nondiabetic patients. The violin plot shows that the distribution of blood pressure for diabetic patients is slightly higher than for nondiabetic patients. However, there has not been enough evidence to conclude that blood pressure is a good predictor of diabetes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Skin Thickness and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">fig,ax = plt.subplots(<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>,figsize=(<span class=\"hljs-number\">15<\/span>,<span class=\"hljs-number\">5<\/span>))\nsns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>, y=<span class=\"hljs-string\">'SkinThickness'<\/span>, data=df,ax=ax[<span class=\"hljs-number\">0<\/span>]).set_title(<span class=\"hljs-string\">'SkinThickness vs Diabetes'<\/span>)\nsns.violinplot(x=<span class=\"hljs-string\">'Outcome'<\/span>, y=<span class=\"hljs-string\">'SkinThickness'<\/span>, data=df,ax=ax[<span class=\"hljs-number\">1<\/span>]).set_title(<span class=\"hljs-string\">'SkinThickness vs Diabetes'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*-btWmLyAbHjiyYK80d0jaA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 14: Skin Thickness and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Here, both the boxplot and violinplot reveal the effect of diabetes on skin thickness. As observed in the boxplot, the median skin thickness is higher for diabetic patients than nondiabetic patients. Nondiabetic patients have a median skin thickness of nearly 20, compared to almost 30 in diabetic patients. The violin plot shows the distribution of patients&#8217; skin thickness among the patients, where the nondiabetic ones have a greater distribution near 20, people with diabetes have a smaller distribution near 20, and increased distribution near 30. Therefore, skin thickness can be an indicator of diabetes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Insulin and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">fig,ax = plt.subplots(<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>,figsize=(<span class=\"hljs-number\">15<\/span>,<span class=\"hljs-number\">5<\/span>))\nsns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'Insulin'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">0<\/span>]).set_title(<span class=\"hljs-string\">'Insulin vs Diabetes'<\/span>)\nsns.violinplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'Insulin'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">1<\/span>]).set_title(<span class=\"hljs-string\">'Insulin vs Diabetes'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*J2QCZX8OuBXofYeZ1PVfFg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 15: Insulin and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Insulin is a major body hormone that regulates glucose metabolism. It&#8217;s required for the body to use sugars, fats, and proteins efficiently. Any change in insulin amount in the body would also result in a change in glucose levels. Here, the boxplot and violinplot show the distribution of insulin levels in patients. In nondiabetic patients, the insulin level is near 100, whereas in diabetic patients, the insulin level is near 200. In the violin plot, we can see that the distribution of insulin levels in nondiabetic patients is more spread out near 100, whereas, in diabetic patients, the distribution is contracted and shows a little spread in higher insulin levels. This indicates that the insulin level is a good indicator of diabetes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">BMI and&nbsp;Diabetes<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">fig,ax = plt.subplots(<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>,figsize=(<span class=\"hljs-number\">15<\/span>,<span class=\"hljs-number\">5<\/span>))\nsns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'BMI'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">0<\/span>])\nsns.violinplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'BMI'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">1<\/span>])<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*c9Gx5L0sRwoooZcfAi0UOg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 16: BMI and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Both graphs highlight the role of BMI in diabetes prediction. Nondiabetic patients have a normal BMI within the range of 25\u201335, whereas diabetic patients have a BMI greater than 35. The violin plot reveals the BMI distribution, where the nondiabetic patients have an increased spread from 25 to 35, with narrows after 35. However, in diabetic patients, there is an increased spread at 35 and an increased spread at 45\u201350 compared to nondiabetic patients. Therefore, BMI is a good predictor of diabetes, and obese people are more likely to be diabetic.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Diabetes Pedigree Function and Diabetes&nbsp;Outcome<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">fig,ax = plt.subplots(<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>,figsize=(<span class=\"hljs-number\">15<\/span>,<span class=\"hljs-number\">5<\/span>))\nsns.boxplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'DiabetesPedigreeFunction'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">0<\/span>]).set_title(<span class=\"hljs-string\">'Diabetes Pedigree Function'<\/span>)\nsns.violinplot(x=<span class=\"hljs-string\">'Outcome'<\/span>,y=<span class=\"hljs-string\">'DiabetesPedigreeFunction'<\/span>,data=df,ax=ax[<span class=\"hljs-number\">1<\/span>]).set_title(<span class=\"hljs-string\">'Diabetes Pedigree Function'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*K0uV2FmhrlVC90_x8Dor4Q.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 17: Diabetes Pedigree Function and&nbsp;Diabetes<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The Diabetes Pedigree Function (DPF) calculates diabetes likelihood depending on the subject&#8217;s age and diabetic family history. The boxplot shows that patients with lower DPF are much less likely to have diabetes. The patients with higher DPF are much more likely to have diabetes. In the violin plot, the majority of the nondiabetic patients have a DPF of 0.25\u20130.35, whereas the diabetic patients have an increased DPF, which is shown by their distribution in the violin plot where there is an increased spread in the DPF from 0.5 -1.5. Therefore, the DPF is a good indicator of diabetes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Correlation Matrix&nbsp;Heatmap<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#correlation heatmap<\/span>\nplt.figure(figsize=(<span class=\"hljs-number\">12<\/span>,<span class=\"hljs-number\">12<\/span>))\nsns.heatmap(df.corr(), annot=<span class=\"hljs-literal\">True<\/span>, cmap=<span class=\"hljs-string\">'coolwarm'<\/span>).set_title(<span class=\"hljs-string\">'Correlation Heatmap'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*fgJUur5AtHA95nP59baoTw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 18: Correlation Matrix&nbsp;Heatmap<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Train Test&nbsp;Split<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">from<\/span> sklearn.model_selection <span class=\"hljs-keyword\">import<\/span> train_test_split\nX_train, X_test, y_train, y_test = train_test_split(df.drop(<span class=\"hljs-string\">'Outcome'<\/span>,axis=<span class=\"hljs-number\">1<\/span>),df[<span class=\"hljs-string\">'Outcome'<\/span>],test_size=<span class=\"hljs-number\">0.2<\/span>,random_state=<span class=\"hljs-number\">42<\/span>)<\/span><\/pre>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Diabetes Prediction<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">For predicting diabetes, I will be using the following algorithms:<\/p>\n\n\n\n<ol class=\"wp-block-list postList\">\n<li>Logistic Regression<\/li>\n\n\n\n<li>Random Forest Classifier<\/li>\n\n\n\n<li>Support Vector Machine<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Logistic Regression<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#building model<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.linear_model <span class=\"hljs-keyword\">import<\/span> LogisticRegression\nlr = LogisticRegression()\n\n<span class=\"hljs-comment\">#training the model<\/span>\nlr.fit(X_train,y_train)\n\n<span class=\"hljs-comment\">#training accuracy<\/span>\nlr.score(X_train,y_train)\n=&gt; <span class=\"hljs-number\">0.7719869706840391<\/span>\n\n<span class=\"hljs-comment\">#predicted outcomes<\/span>\nlr_pred = lr.predict(X_test)<\/span><\/pre>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Random Forest Classifier<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#buidling model<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.ensemble <span class=\"hljs-keyword\">import<\/span> RandomForestClassifier\nrfc = RandomForestClassifier(n_estimators=<span class=\"hljs-number\">100<\/span>,random_state=<span class=\"hljs-number\">42<\/span>)\n\n<span class=\"hljs-comment\">#training model<\/span>\nrfc.fit(X_train, y_train)\n\n<span class=\"hljs-comment\">#training accuracy<\/span>\nrfc.score(X_train, y_train)\n=&gt; <span class=\"hljs-number\">0.978543940<\/span>\n\n<span class=\"hljs-comment\">#predicted outcomes<\/span>\nrfc_pred = rfc.predict(X_test)<\/span><\/pre>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Support Vector Machine&nbsp;(SVM)<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#building model<\/span>\n<span class=\"hljs-keyword\">from<\/span> sklearn.svm <span class=\"hljs-keyword\">import<\/span> SVC\nsvm = SVC(kernel=<span class=\"hljs-string\">'linear'<\/span>, random_state=<span class=\"hljs-number\">0<\/span>)\n\n<span class=\"hljs-comment\">#training the model<\/span>\nsvm.fit(X_train, y_train)\n\n<span class=\"hljs-comment\">#training the model<\/span>\nsvm.score(X_test, y_test)\n=&gt; <span class=\"hljs-number\">0.7597402597402597<\/span>\n\n<span class=\"hljs-comment\">#predicting outcomes<\/span>\nsvm_pred = svm.predict(X_test)<\/span><\/pre>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Model Evaluation<\/h3>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Evaluating the Logistic Regression Model<\/h4>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Confusion Matrix&nbsp;Heatmap<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> confusion_matrix\nsns.heatmap(confusion_matrix(y_test, lr_pred), annot=<span class=\"hljs-literal\">True<\/span>, cmap=<span class=\"hljs-string\">'Blues'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Predicted Values'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Actual Values'<\/span>)\nplt.title(<span class=\"hljs-string\">'Confusion Matrix for Logistic Regression'<\/span>)\nplt.show()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*08GAveOw1ti4FjAJoLEKWA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 19: Logistic Regression Confusion Matrix<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Distribution Plot<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">ax = sns.distplot(y_test, color=<span class=\"hljs-string\">'r'<\/span>,  label=<span class=\"hljs-string\">'Actual Value'<\/span>,hist=<span class=\"hljs-literal\">False<\/span>)\nsns.distplot(lr_pred, color=<span class=\"hljs-string\">'b'<\/span>, label=<span class=\"hljs-string\">'Predicted Value'<\/span>,hist=<span class=\"hljs-literal\">False<\/span>,ax=ax)\nplt.title(<span class=\"hljs-string\">'Actual vs Predicted Value Logistic Regression'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Outcome'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Count'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*mIUWzO7aQwXUUd4QVMRedQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 20: Distribution Plot Logistic Regression<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Classification Report<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> classification_report\n<span class=\"hljs-built_in\">print<\/span>(classification_report(y_test, lr_pred))<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*FJotzKoq80yuDd-B8WPxBg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 21: Classification Report Logistic Regression<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> accuracy_score,mean_absolute_error,mean_squared_error,r2_score\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Accuracy Score: '<\/span>,accuracy_score(y_test,lr_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Mean Absolute Error: '<\/span>,mean_absolute_error(y_test,lr_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Mean Squared Error: '<\/span>,mean_squared_error(y_test,lr_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'R2 Score: '<\/span>,r2_score(y_test,lr_pred))<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*5_kLek6RVA9oB42rWH6sXA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 22: Logistic Regression Model&nbsp;Metrics<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Evaluating Random Forest Classifier<\/h4>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Confusion Matrix&nbsp;Heatmap<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">sns.heatmap(confusion_matrix(y_test, rfc_pred), annot=<span class=\"hljs-literal\">True<\/span>, cmap=<span class=\"hljs-string\">'Blues'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Predicted Values'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Actual Values'<\/span>)\nplt.title(<span class=\"hljs-string\">'Confusion Matrix for Logistic Regression'<\/span>)\nplt.show()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*DW_w4-m7Jwsf0w3yy7rx0g.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 23: Confusion Matrix Random&nbsp;Forest<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Distribution Plot<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">ax = sns.distplot(y_test, color=<span class=\"hljs-string\">'r'<\/span>,  label=<span class=\"hljs-string\">'Actual Value'<\/span>,hist=<span class=\"hljs-literal\">False<\/span>)\nsns.distplot(rfc_pred, color=<span class=\"hljs-string\">'b'<\/span>, label=<span class=\"hljs-string\">'Predicted Value'<\/span>,hist=<span class=\"hljs-literal\">False<\/span>,ax=ax)\nplt.title(<span class=\"hljs-string\">'Actual vs Predicted Value Logistic Regression'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Outcome'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Count'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*2fQdD8NTT7JJQBFfmZtkgQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 24: Distribution Plot Random&nbsp;Forest<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Classification Report<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-built_in\">print<\/span>(classification_report(y_test, rfc_pred))<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*_0ZRxvwmi-0GC1RNqn0COQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 25: Classification Report Random&nbsp;Forest<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Accuracy Score: '<\/span>,accuracy_score(y_test,rfc_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Mean Absolute Error: '<\/span>,mean_absolute_error(y_test,rfc_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Mean Squared Error: '<\/span>,mean_squared_error(y_test,rfc_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'R2 Score: '<\/span>,r2_score(y_test,rfc_pred))<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*voQmD-h4WVl6v9MigAuDBw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 26: Model Metrics Random&nbsp;Forest<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Evaluating SVM&nbsp;Model<\/h4>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Confusion Matrix&nbsp;Heatmap<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">sns.heatmap(confusion_matrix(y_test, svm_pred), annot=<span class=\"hljs-literal\">True<\/span>, cmap=<span class=\"hljs-string\">'Blues'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Predicted Values'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Actual Values'<\/span>)\nplt.title(<span class=\"hljs-string\">'Confusion Matrix for Logistic Regression'<\/span>)\nplt.show()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*KAEXd0Ebx2M5k2FVvw7u9g.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 27: Confusion Matrix&nbsp;SVM<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Distribution Plot<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">ax = sns.distplot(y_test, color=<span class=\"hljs-string\">'r'<\/span>,  label=<span class=\"hljs-string\">'Actual Value'<\/span>,hist=<span class=\"hljs-literal\">False<\/span>)\nsns.distplot(svm_pred, color=<span class=\"hljs-string\">'b'<\/span>, label=<span class=\"hljs-string\">'Predicted Value'<\/span>,hist=<span class=\"hljs-literal\">False<\/span>,ax=ax)\nplt.title(<span class=\"hljs-string\">'Actual vs Predicted Value Logistic Regression'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Outcome'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Count'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*wMgPWkPGoju06dOxm_vjbQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 28: Distribution Plot&nbsp;SVM<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Classification Report<\/h4>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-built_in\">print<\/span>(classification_report(y_test, rfc_pred))<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*PZVuYel49YyFD-Cc4wEgrQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 29: Classification Report&nbsp;SVM<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Accuracy Score: '<\/span>,accuracy_score(y_test,svm_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Mean Absolute Error: '<\/span>,mean_absolute_error(y_test,svm_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Mean Squared Error: '<\/span>,mean_squared_error(y_test,svm_pred))\n<span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'R2 Score: '<\/span>,r2_score(y_test,svm_pred))<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*2JkfOFaGe0EsVmYgtf2JNg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 30: Model Metrics&nbsp;SVM<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Comparing the&nbsp;Models<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\">#comparing the accuracy of different models<\/span>\nsns.barplot(x=[<span class=\"hljs-string\">'Logistic Regression'<\/span>, <span class=\"hljs-string\">'RandomForestClassifier'<\/span>, <span class=\"hljs-string\">'SVM'<\/span>], y=[0.7792207792207793,0.7662337662337663,0.7597402597402597])\nplt.xlabel(<span class=\"hljs-string\">'Classifier Models'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Accuracy'<\/span>)\nplt.title(<span class=\"hljs-string\">'Comparison of different models'<\/span>)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image graf graf--figure\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ttrvImUt75jxN473-yDI_g.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 31: Model Comparison<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Conclusion<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">From the exploratory data analysis, I have concluded that the risk of diabetes depends upon the following factors:<\/p>\n\n\n\n<ol class=\"wp-block-list postList\">\n<li>Glucose level<\/li>\n\n\n\n<li>Number of pregnancies<\/li>\n\n\n\n<li>Skin Thickness<\/li>\n\n\n\n<li>Insulin level<\/li>\n\n\n\n<li>BMI<\/li>\n<\/ol>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">With an increase in Glucose level, insulin level, BMI, and number of pregnancies, the risk of diabetes increases. However, the number of pregnancies has a strange effect on the risk of diabetes, which the data couldn&#8217;t explain. The risk of diabetes also increases with an increase in skin thickness.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Coming to the classification models, Logistic Regression outperformed Random Forest and SVM with 78% accuracy. The model&#8217;s accuracy can be improved by increasing the size of the dataset. The dataset used for this project was very small and had only 768 rows.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This project aims to analyze the medical factors of a patient, such as Glucose Level, Blood Pressure, Skin Thickness, Insulin Level, and many others, to predict whether the patient has diabetes or not. About the&nbsp;Dataset This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset [&hellip;]<\/p>\n","protected":false},"author":120,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7],"tags":[],"coauthors":[217],"class_list":["post-9408","post","type-post","status-publish","format-standard","hentry","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Pima Indian Diabetes Prediction - Comet<\/title>\n<meta name=\"description\" content=\"This project covers diabetes prediction by looking at different medical factors, such as glucose level, blood pressure, etc.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Pima Indian Diabetes Prediction\" \/>\n<meta property=\"og:description\" content=\"This project covers diabetes prediction by looking at different medical factors, such as glucose level, blood pressure, etc.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-05T14:00:43+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:03:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg\" \/>\n<meta name=\"author\" content=\"Sukhman Singh\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sukhman Singh\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Pima Indian Diabetes Prediction - Comet","description":"This project covers diabetes prediction by looking at different medical factors, such as glucose level, blood pressure, etc.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction","og_locale":"en_US","og_type":"article","og_title":"Pima Indian Diabetes Prediction","og_description":"This project covers diabetes prediction by looking at different medical factors, such as glucose level, blood pressure, etc.","og_url":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2024-03-05T14:00:43+00:00","article_modified_time":"2025-04-24T17:03:02+00:00","og_image":[{"url":"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg","type":"","width":"","height":""}],"author":"Sukhman Singh","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Sukhman Singh","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction\/"},"author":{"name":"Sukhman Singh","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/03fee0ceb719563b09dcf21796f85455"},"headline":"Pima Indian Diabetes Prediction","datePublished":"2024-03-05T14:00:43+00:00","dateModified":"2025-04-24T17:03:02+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction\/"},"wordCount":1284,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg","articleSection":["Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction\/","url":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction","name":"Pima Indian Diabetes Prediction - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg","datePublished":"2024-03-05T14:00:43+00:00","dateModified":"2025-04-24T17:03:02+00:00","description":"This project covers diabetes prediction by looking at different medical factors, such as glucose level, blood pressure, etc.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#primaryimage","url":"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg","contentUrl":"https:\/\/cdn-images-1.medium.com\/max\/800\/0*7p7zUmwZ02iokYmP.jpg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/pima-indian-diabetes-prediction#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Pima Indian Diabetes Prediction"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/03fee0ceb719563b09dcf21796f85455","name":"Sukhman Singh","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/cbb8e2cab1e32e43711b4e44c7711e76","url":"https:\/\/secure.gravatar.com\/avatar\/2e74502dcaa20f8fe126f4e1eef424b9227d93803c9cc857de010fbca0b41265?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2e74502dcaa20f8fe126f4e1eef424b9227d93803c9cc857de010fbca0b41265?s=96&d=mm&r=g","caption":"Sukhman Singh"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/sukhmansinghbhogalgmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/9408","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/120"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=9408"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/9408\/revisions"}],"predecessor-version":[{"id":15378,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/9408\/revisions\/15378"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=9408"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=9408"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=9408"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=9408"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}