{"id":8033,"date":"2023-10-25T14:54:09","date_gmt":"2023-10-25T22:54:09","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8033"},"modified":"2025-04-24T17:05:11","modified_gmt":"2025-04-24T17:05:11","slug":"integrating-comet-and-azure-databricks","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks\/","title":{"rendered":"Integrating Comet and Azure Databricks"},"content":{"rendered":"\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg\" alt=\"\" width=\"600\" height=\"600\"><\/figure><div class=\"mh mi mj\"><picture><\/picture><\/div><figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Diamond, <a class=\"af mx\" href=\"http:\/\/depositphotos.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Deposit Photos<\/a><\/figcaption><\/figure>\n<h2 id=\"2f48\" class=\"my mz fr be na nb nc gr nd ne nf gu ng nh ni nj nk nl nm nn no np nq nr ns nt bj\">Introduction<\/h2>\n<p id=\"d997\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Recently I made the switch to actively tracking my machine learning experiments. First with ML Flow and then with <a class=\"af mx\" href=\"https:\/\/www.comet.com\/site\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a>.<\/p>\n<p id=\"ca7f\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Tracking experiments is becoming more common in MLOps. As time goes on, data scientists and machine learning engineers have realized the importance of model versioning. This has gone beyond saving metrics in experiments. Modern experiment tracking now includes saving metrics, logging charts, and even saving artifacts.<\/p>\n<p id=\"6bd9\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">In this article, we\u2019ll focus on Comet\u2019s experiment tracking within an Azure Databricks environment. In this article I\u2019ll cover four things:<\/p>\n<ul class=\"\">\n<li id=\"816f\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Setting Up Comet ML in Azure Databricks<\/li>\n<li id=\"21d7\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Creating an Experiment<\/li>\n<li id=\"c189\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Using Comet for Exploratory Data Analysis<\/li>\n<li id=\"bbba\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Creating Algorithms and Comparing the Results in Comet<\/li>\n<\/ul>\n<p id=\"8346\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Databricks has built-in ML experiment tracking using MLFlow, which I find not very beginner friendly, with a high learning curve. Comet is easy to use for data science professionals of different skill levels and is easy to set up.<\/p>\n<p id=\"f0cb\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The goal of this project is to demonstrate for beginners the advantages of Comet in an Azure Databricks environment.<\/p>\n<h2 id=\"a228\" class=\"my mz fr be na nb nc gr nd ne nf gu ng nh ni nj nk nl nm nn no np nq nr ns nt bj\">Installing Libraries in Azure Databricks<\/h2>\n<blockquote class=\"pf pg ph\"><p id=\"923a\" class=\"nu nv pi be b gp op nx ny gs oq oa ob pj or oe of pk os oi oj pl ot om on oo fk bj\" data-selectable-paragraph=\"\">This article assumes you know how to navigate Databricks and upload\/mount the datasets that will be used.<\/p><\/blockquote>\n<p id=\"ae6c\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">Dependencies:<\/strong><\/p>\n<ul class=\"\">\n<li id=\"e55b\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">comet_ml<\/li>\n<li id=\"e688\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">comet-for-mlflow<\/li>\n<\/ul>\n<p id=\"2a34\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Other libraries such as pandas, numpy, scikit-learn, Seaborn, and matplotlib are already pre-installed in Databricks.<\/p>\n<h2 id=\"97e3\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Install Process<\/strong><\/h2>\n<p id=\"f631\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">The installation process is straightforward for installing Comet into Azure Databricks. If you already use AutoML, installing the comet-for-mlflow package is also really useful. Comet will log your existing MLFlow work and save it to a current Comet experiment.<\/p>\n<p id=\"2416\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The first step is installing it in the cluster. Once you have started your cluster, select the <em class=\"pi\">Install New <\/em>button, highlighted below.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<div class=\"qf qg ee qh bg qi\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7XSXE1druNLR7afGyeQvJQ.png\" alt=\"\" width=\"700\" height=\"390\"><\/figure><div class=\"mh mi qe\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*7XSXE1druNLR7afGyeQvJQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*7XSXE1druNLR7afGyeQvJQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*7XSXE1druNLR7afGyeQvJQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*7XSXE1druNLR7afGyeQvJQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*7XSXE1druNLR7afGyeQvJQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*7XSXE1druNLR7afGyeQvJQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*7XSXE1druNLR7afGyeQvJQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*7XSXE1druNLR7afGyeQvJQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Author\u2019s picture<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"457a\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">In the Install Library window, select PyPI. Then type the name of the package. I would recommend at this point installing both comet_ml and comet_automl.<\/p>\n<p id=\"bcbb\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">There is an option to install on all your current clusters \u2014 this a personal choice. Generally, I only install Comet Libraries on either my clusters used for development or for production.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:482\/1*e_TkO3CyTjb4ks-yBBJjSQ.png\" alt=\"\" width=\"482\" height=\"307\"><\/figure><div class=\"mh mi qj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:964\/format:webp\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 964w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 482px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:964\/1*e_TkO3CyTjb4ks-yBBJjSQ.png 964w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 482px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"c36b\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Since Databricks is not a Jupyter Notebook nor a Python file, there will be instances where certain Comet features do not work. You will often have to cover gaps with AutoML, depending on what you want to save to your experiments.<\/p>\n<p id=\"a580\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Now you\u2019re ready to set up an experiment in a Databricks notebook.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<p id=\"06d3\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Before we create the experiment, let\u2019s first look at the dataset we will be using. The data set is the <a href=\"https:\/\/www.kaggle.com\/shivam2503\/diamonds?source=post_page-----4ec97703a2fe--------------------------------\">Diamonds dataset<\/a> from Kaggle.<\/p>\n<h2 id=\"2f4a\" class=\"my mz fr be na nb nc gr nd ne nf gu ng nh ni nj nk nl nm nn no np nq nr ns nt bj\"><strong class=\"al\">Data<\/strong><\/h2>\n<p id=\"c448\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">The diamond data set has 53940 rows and 10 features:<\/p>\n<ul class=\"\">\n<li id=\"3f87\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">price \u2014<\/strong> price in US dollars (numeric)<\/li>\n<li id=\"4ee2\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">carat \u2014<\/strong> weight of the diamond (numeric)<\/li>\n<li id=\"abba\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">cut<\/strong> \u2014 quality of the cut (categorical)<\/li>\n<li id=\"c76f\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">color \u2014<\/strong> diamond color (categorical)<\/li>\n<li id=\"5a71\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">clarity<\/strong> \u2014 a measurement of how clear the diamond is (categorical)<\/li>\n<li id=\"2eb2\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">x \u2014<\/strong> length in mm (numeric)<\/li>\n<li id=\"2364\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">y \u2014<\/strong> width in mm (numeric)<\/li>\n<li id=\"8c02\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">z \u2014<\/strong> depth in mm (numeric)<\/li>\n<li id=\"3604\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">depth \u2014<\/strong> total depth (numeric)<\/li>\n<li id=\"9628\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">table \u2014<\/strong> width of top of diamond relative to widest point (numeric)<\/li>\n<\/ul>\n<p id=\"742e\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Further information on carat, cut, color, quality, and other terms can be found <a href=\"https:\/\/www.lumeradiamonds.com\/diamond-education\/index?source=post_page-----4ec97703a2fe--------------------------------\">here<\/a>.<\/p>\n<h2 id=\"10f4\" class=\"my mz fr be na nb nc gr nd ne nf gu ng nh ni nj nk nl nm nn no np nq nr ns nt bj\">Creating an Experiment<\/h2>\n<p id=\"ffa6\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Before creating your experiment, select the computer cluster where you installed comet and comet_automl. Once you have selected the cluster, create a new Databricks notebook. We will load the following libraries:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"agw rp l\">\n<pre>import pandas as pd\nimport numpy as np\nimport seaborn as sns\nimport matplotlib.pyplot as plt\n\n#Comet Experiments\nfrom comet_ml import Experiment\n\n# Classification\nfrom sklearn.svm import SVC, LinearSVC\nfrom sklearn.ensemble import RandomForestClassifier , GradientBoostingClassifier\nfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysis , QuadraticDiscriminantAnalysis\n\n# Regression\nfrom sklearn.linear_model import LinearRegression,Ridge,Lasso\nfrom sklearn.ensemble import RandomForestRegressor,BaggingRegressor,GradientBoostingRegressor,AdaBoostRegressor\nfrom sklearn.svm import SVR\nfrom sklearn.neighbors import KNeighborsRegressor\nfrom sklearn.neural_network import MLPRegressor\n\n# Modelling Helpers :\nfrom sklearn.preprocessing import  Normalizer , scale\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.feature_selection import RFECV\nfrom sklearn.model_selection import GridSearchCV , KFold , cross_val_score\n\n#preprocessing :\nfrom sklearn.preprocessing import MinMaxScaler , StandardScaler, LabelEncoder\n\n# Regression\nfrom sklearn.metrics import mean_squared_log_error,mean_squared_error, r2_score,mean_absolute_error\n\n# Classification\nfrom sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score<\/pre>\n<\/div>\n<\/figure>\n<p id=\"2d6c\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Once you have loaded the libraries, load your dataset into Databricks. You can do this by uploading your CSV directly to Databricks. If you have a blob storage, you can mount the storage container.<\/p>\n<h2 id=\"4f63\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Setting Up an Experiment<\/strong><\/h2>\n<p id=\"1b8b\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Let\u2019s create an experiment. When you first create your experiment, make sure you add the API key from your Comet account. The Comet account name is your workspace name, so add it as well.<\/p>\n<p id=\"44f1\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Once you have added it, add your experiment name. This will be the main repository for all the runs in your experiment. If the experiment name does not exist in your Comet workspace, it will create a new experiment with the name.<\/p>\n<p id=\"d613\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">If you want your code to show in a run, then set log_code parameter to True. If you are concerned about privacy or revealing your IP address, set <em class=\"pi\">log_env_host <\/em>to False. Azure will log different environment settings than if you ran it in a Jupyter Notebook.<\/p>\n<p id=\"e78f\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">For the sake of clarity, I always add an experiment tag to note the experiment environment. I also add tags for experiment models that are in development, staging, and production.<\/p>\n<p id=\"e4f6\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">If you are working in a team, you may not be working solely in Databricks, so it helps to add multiple tags.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"ab ca qk ql qm qn\" role=\"separator\">\n<pre>experiment = Experiment(\n    api_key=[API KEY]\n    project_name=\"experiment_name\",\n    workspace=\"your_workspace\",\n    log_code = True #add this if you want to save your code to an experiment\n    log_env_host=False #add if you want to hide your IP or system settings\n)\n\n#Add a tag to distinguish this from other experiments\nexperiment.add_tag(\"Azure Databricks\")\n\n#log dataframe profile\nexperiment.log_dataframe_profile(diamond_pd,\"Diamond Pandas Dataframe\")<\/pre>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<blockquote class=\"rq\"><p id=\"8152\" class=\"rr rs fr be rt ru rv rw rx ry rz oo dw\" data-selectable-paragraph=\"\">What tips do big name companies have for students and start ups? We asked them! <a class=\"af mx\" href=\"https:\/\/www.comet.com\/site\/industry-qa-where-most-machine-learning-projects-fail\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Read or watch our industry Q&amp;A<\/a> for advice from teams at Stanford, Google, and HuggingFace.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"6693\" class=\"my mz fr be na nb sa gr nd ne sb gu ng nh sc nj nk nl sd nn no np se nr ns nt bj\">EDA in Databricks<\/h2>\n<p id=\"b993\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Exploratory data analysis using Comet in Azure Databricks works very similar to using it in a Jupyter Notebook.<\/p>\n<p id=\"ed7c\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Logging figures is pretty important. Data may change between runs of an experiment for various reasons, so it\u2019s important to save charts from your EDAs to an experiment.<\/p>\n<p id=\"5ee7\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">To be able to create the figures, your dataframe will need to be in a pandas format rather than in a Spark dataframe.<\/p>\n<h2 id=\"725d\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Logging Figures<\/strong><\/h2>\n<p id=\"8712\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">In Comet there are two ways to log figures to an experiment: matplotlib or seaborn. For this project, we used both Seaborn and matplotlib.<\/p>\n<p id=\"b1cd\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">An example of logging a matplot lib figure is here:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"agy rp l\">\n<pre>plt.plot(diamond['carat'], diamond['price'], '.')\nplt.xlabel('carat')\nplt.ylabel('price')\nexperiment.log_figure(figure=plt)<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"1316\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Matplotlib is the simplest way to log a graph. Simply put the <em class=\"pi\">experiment.log_figure<\/em> method after your completed figure. Do not add the plt.show()\u2014you will get an error and the figure will not be saved.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:362\/1*OqbRk6BYuFjmEKFeJEGlYA.png\" alt=\"\" width=\"362\" height=\"228\"><\/figure><div class=\"mh mi sf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:724\/format:webp\/1*OqbRk6BYuFjmEKFeJEGlYA.png 724w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 362px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*OqbRk6BYuFjmEKFeJEGlYA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*OqbRk6BYuFjmEKFeJEGlYA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*OqbRk6BYuFjmEKFeJEGlYA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*OqbRk6BYuFjmEKFeJEGlYA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*OqbRk6BYuFjmEKFeJEGlYA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*OqbRk6BYuFjmEKFeJEGlYA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:724\/1*OqbRk6BYuFjmEKFeJEGlYA.png 724w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 362px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Matplotlib Figure, Author\u2019s Picture<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"ff65\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Unlike matplotlib, Seaborn has some trouble saving charts in Comet. There\u2019s a quick workaround to this. You will first need to save the Seaborn figure as a variable. Then convert it using .fig or .figure \u2014 some charts can only be saved using one of these.<\/p>\n<p id=\"61bd\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">I used the following method to save the charts:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"agz rp l\">\n<pre>def log_SeaFigure(fig, fig_name):\n    '''\n    Logs the seaborn figure, first by using depreciated ax.fig, and runs ax.figure if an exception is raised.\n\n    Parameters:\n    fig (object) - seaborn figure\n    fig_name (string) - the user defined name for seaborn figure in comet experiment\n\n    Returns:\n    Logs figure to comet experiment log, and prints the method used or an error message.\n\n    '''\n    ax = fig\n    try:\n      experiment.log_figure(fig_name, ax.fig)\n      print('Log Figure Successful using ax.fig')\n    except:\n      experiment.log_figure(fig_name, ax.figure)\n      print('Log Figure Successful using ax.figure')\n    else:\n      print(\"Figure Error: Please check chart parameters\")<\/pre>\n<\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\">Log Seaborn Figure<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"e2e2\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Alternatively, you can use exception handling without using the method:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"aha rp l\">\n<pre>ax = sns.violinplot(x=\"cut\",y=\"price\",data=df2)\n\ntry:\n      experiment.log_figure(fig_name, ax.fig)\n      print('Log Figure Successful using ax.fig')\nexcept:\n      experiment.log_figure(fig_name, ax.figure)\n      print('Log Figure Successful using ax.figure')<\/pre>\n<\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\">Exception Handling Using Seaborn<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"024a\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">For this article we also logged histograms, a correlation matrix and a scatter plot, and a violin plot. Let\u2019s take a look at the figures that we logged in our test EDA.<\/p>\n<h2 id=\"08f3\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Histograms<\/strong><\/h2>\n<p id=\"bf11\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">In the dataset, diamonds usually have smaller carat size and prices. The length, width, and height (x,y,z) are within very small ranges.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<div class=\"qf qg ee qh bg qi\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*D5ijnnWFzgAd1AtjLoN0Xg.png\" alt=\"\" width=\"700\" height=\"491\"><\/figure><div class=\"mh mi sg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*D5ijnnWFzgAd1AtjLoN0Xg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h2 id=\"cbd0\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Correlation Matrix<\/strong><\/h2>\n<p id=\"e5e3\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Carat, width, depth, and height are highly correlated with each other. Price is also highly correlated with carat size. Given their high correlation with each other, they may be multi-collinear. We will be dropping these variables later down the line.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:502\/1*wrVFCagdGOXLKdkyaPeVXQ.png\" alt=\"\" width=\"502\" height=\"376\"><\/figure><div class=\"mh mi sh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1004\/format:webp\/1*wrVFCagdGOXLKdkyaPeVXQ.png 1004w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 502px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*wrVFCagdGOXLKdkyaPeVXQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*wrVFCagdGOXLKdkyaPeVXQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*wrVFCagdGOXLKdkyaPeVXQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*wrVFCagdGOXLKdkyaPeVXQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*wrVFCagdGOXLKdkyaPeVXQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*wrVFCagdGOXLKdkyaPeVXQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1004\/1*wrVFCagdGOXLKdkyaPeVXQ.png 1004w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 502px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Author\u2019s Image<\/figcaption>\n<\/figure>\n<h2 id=\"6ad2\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Carat vs. Price<\/strong><\/h2>\n<p id=\"797b\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">As we can see from the model, most cuts range from below 1 carat to 2.5 carats, regardless of cut quality. This suggests that the carat size is being standardized.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:626\/1*L47-r2lIX2oMTDaEQM04ZQ.png\" alt=\"\" width=\"626\" height=\"478\"><\/figure><div class=\"mh mi si\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1252\/format:webp\/1*L47-r2lIX2oMTDaEQM04ZQ.png 1252w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 626px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*L47-r2lIX2oMTDaEQM04ZQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*L47-r2lIX2oMTDaEQM04ZQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*L47-r2lIX2oMTDaEQM04ZQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*L47-r2lIX2oMTDaEQM04ZQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*L47-r2lIX2oMTDaEQM04ZQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*L47-r2lIX2oMTDaEQM04ZQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1252\/1*L47-r2lIX2oMTDaEQM04ZQ.png 1252w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 626px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Author\u2019s image<\/figcaption>\n<\/figure>\n<h2 id=\"c198\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Cut vs. Price<\/strong><\/h2>\n<p id=\"1b02\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">The median price for all cuts is below $5000 USD. Ideal cuts show the lowest median price. Premium, good, and fair cuts have a higher median price than very good and premium cuts.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:547\/1*K_0tDA3vp4srkD3c6gg33w.png\" alt=\"\" width=\"547\" height=\"369\"><\/figure><div class=\"mh mi sj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1094\/format:webp\/1*K_0tDA3vp4srkD3c6gg33w.png 1094w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 547px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*K_0tDA3vp4srkD3c6gg33w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*K_0tDA3vp4srkD3c6gg33w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*K_0tDA3vp4srkD3c6gg33w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*K_0tDA3vp4srkD3c6gg33w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*K_0tDA3vp4srkD3c6gg33w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*K_0tDA3vp4srkD3c6gg33w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1094\/1*K_0tDA3vp4srkD3c6gg33w.png 1094w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 547px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Author\u2019s image<\/figcaption>\n<\/figure>\n<h2 id=\"6abe\" class=\"my mz fr be na nb nc gr nd ne nf gu ng nh ni nj nk nl nm nn no np nq nr ns nt bj\">Model Preparation<\/h2>\n<p id=\"9728\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">To prep for modeling we first have to do three steps: transforming the data, converting the dataframe to Spark, and vectorizing.<\/p>\n<h2 id=\"36a0\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Data Transformation<\/strong><\/h2>\n<p id=\"8ba9\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">The dataset still contains three categorical features: cut, color, and clarity. In addition, it contains three variables: length(x), width (y), and depth (z).<\/p>\n<p id=\"8364\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">We will first encode the categorical variables since we cannot use string data types for vector encoding features. We will also drop the x, y, and z. An example of this is below:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahb rp l\">\n<pre>#Feature Endcode Using a Dictonary\ndiamond['cut'] = diamond['cut'].replace({'Fair':0, 'Good':1, 'Very Good':2, 'Premium':3, 'Ideal':4})\ndiamond['color'] = diamond['color'].replace({'J':0, 'I':1, 'H':2, 'G':3, 'F':4, 'E':5, 'D':6})\ndiamond['clarity'] = diamond['clarity'].replace({'I1':0, 'SI1':1, 'SI2':2, 'VS1':3, 'VS2':4, 'VVS1':5, 'VVS2':6, 'IF':7})\n\n#Drop length, width, and depth columns\ndiamond.drop(['x','y','z'], axis=1, inplace= True)<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"6275\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Now that that\u2019s done, let\u2019s convert the panadas dataframe to a Spark dataframe.<\/p>\n<h2 id=\"fe9b\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Converting Dataframe to Spark<\/strong><\/h2>\n<p id=\"7ea4\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\"><strong class=\"be pm\">Vectorizing<\/strong><\/p>\n<p id=\"7b56\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The second step is vectorizing. We do this using VectorAssembler. This method is a transformer that combines a given list of columns into a single vector column, which is added to the data frame.<\/p>\n<p id=\"d9f8\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Vectorization is useful for combining raw features and features generated by different feature transformers into a single feature vector in order to train ML models like logistic regression and decision trees.<\/p>\n<p id=\"18ed\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The code example is below:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahc rp l\">\n<pre>#Transform all features into output column named \"features\"\nvectorAssembler = VectorAssembler(inputCols = ['carat', 'cut', 'color', 'clarity', 'depth', 'table'], outputCol = 'features')\nvdiamond_df = vectorAssembler.transform(spark_diamond_drop) #adds feature to spark_diamond_drop df\n\n#Creates a df from features column and the target variable\nvdiamond_df = vdiamond_df.select(['features', 'price'])<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"aa32\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Once we vectorized the features \u2018carat,\u2019 \u2018cut,\u2019 \u2018color,\u2019 \u2018clarity,\u2019 \u2018depth,\u2019 and \u2018table,\u2019 we will select only the price (our target variable) and the features column created by the VectorAssembler method into a new dataset.<\/p>\n<p id=\"f750\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The dataset should look like this after selecting only the features and price columns:<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:413\/1*TmQi1gv9hTgpxZ_2V3Uegw.png\" alt=\"\" width=\"413\" height=\"157\"><\/figure><div class=\"mh mi sk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:826\/format:webp\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 826w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 413px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:826\/1*TmQi1gv9hTgpxZ_2V3Uegw.png 826w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 413px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Vectorized Dataset \u2014 Author\u2019s Image<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"69e2\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Now that we have created the data frame, we will split it randomly into testing and training datasets. In PySpark, this is done using the random split method:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahd rp l\">\n<pre>splits = vdiamond.randomSplit([0.7, 0.3])\ntrain_df = splits[0]\ntest_df = splits[1]<\/pre>\n<\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\">Splitting Spark dataframe<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"0e86\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Now that we have split the data into train_df and test_df, it\u2019s time to build the models.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"01d6\" class=\"my mz fr be na nb sa gr nd ne sb gu ng nh sc nj nk nl sd nn no np se nr ns nt bj\">Building the Algorithms<\/h2>\n<p id=\"e20e\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">To keep it simple, I will be focusing on linear regression. The three models that we will be creating are ordinary least squares, decision tree regression, and gradient boosted regression. Since Databricks is a Spark environment, we will be using PySpark to create these algorithms.<\/p>\n<p id=\"3d14\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">PySpark has its own built-in library for machine learning, pyspark.ml.regression. From this library we imported the LinearRegression, RegressionTree, and GBTRegressor methods.<\/p>\n<p id=\"0803\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">For each regression we created the algorithm, labeled the features column, then logged the metrics to a variable, which we uploaded to our Comet experiment.<\/p>\n<p id=\"9736\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The methods to obtain metrics for decision tree regression and gradient boosted regressor are different than linear regression. So we created a method called <em class=\"pi\">get_SparkMetric<\/em> to obtain the metrics for each:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahe rp l\">\n<pre>def get_SparkMetric(labelCol, predCol, metricName, dfPrediction):\n    '''\n    Returns the a user-specified statistical metric for non-linear regression\n\n    Parameters:\n    labelCol (str) - target column of a Spark Regression\n    predCol (str) - predicted values of regression\n    metricName (str) - metric used for model, such as RMSE, MAE, R2, MSE\n    dfPrediction (obj) - transformed dataframe from test data\n\n    Returns:\n    Metric value for regression\n\n    '''\n    evaluator = RegressionEvaluator(labelCol=labelCol, predictionCol=predCol, metricName=metricName)\n    metric = evaluator.evaluate(dfPrediction)\n    return metric<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"0583\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The metrics we will be logging are:<\/p>\n<ul class=\"\">\n<li id=\"6ad9\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">R2 (Fit)<\/li>\n<li id=\"00e2\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Mean Square Error (MSE)<\/li>\n<li id=\"368e\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Root Mean Square Error (RMSE)<\/li>\n<li id=\"58eb\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Mean Absolute Error (MAE)<\/li>\n<\/ul>\n<p id=\"327b\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">For each of the metrics, we will use the <em class=\"pi\">experiment.log_metric <\/em>method. This method will log a value for each to metric a given experiment run. The method also allows you to set the name of the metric in the experiment store.<\/p>\n<p id=\"72b8\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Now, let\u2019s look at the regression types we will be using.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<blockquote class=\"rq\"><p id=\"d9dc\" class=\"rr rs fr be rt ru rv rw rx ry rz oo dw\" data-selectable-paragraph=\"\">Innovation and academia go hand-in-hand. <a class=\"af mx\" href=\"https:\/\/www.youtube.com\/watch?v=7XCsi64HLQ8.\" target=\"_blank\" rel=\"noopener ugc nofollow\">Listen to our own CEO Gideon Mendels chat with the Stanford MLSys Seminar Series team<\/a> about the future of MLOps and give the <a class=\"af mx\" href=\"https:\/\/www.comet.com\/site\/academics\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet platform a try for free<\/a>!<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"0f68\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Linear Regression<\/strong><\/h2>\n<p id=\"6dba\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">To get metrics for linear regression, we need to do the following:<\/p>\n<ul class=\"\">\n<li id=\"9fa4\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Create the linear regression model and specify the feature column and our label columns.<\/li>\n<li id=\"0570\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Fit the model to the training data frame.<\/li>\n<li id=\"12a7\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">Run and summarize the model metrics using <em class=\"pi\">.summary.<\/em><\/li>\n<li id=\"4fba\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">After loading into a variable, we call the metrics.<\/li>\n<\/ul>\n<p id=\"a785\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">An example is below:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahf rp l\">\n<pre>#Linear Regression Model\nfrom pyspark.ml.regression import LinearRegression\nlr = LinearRegression(featuresCol = 'features', labelCol='price', maxIter=10, regParam=0.3, elasticNetParam=0.8)\nlr_model = lr.fit(train_df)\n\n#Save metrics to variables\nlr_trainingSummary = lr_model.summary\nlr_r2 = lr_trainingSummary.r2\nlr_mse = lr_trainingSummary.meanSquaredError\nlr_rmse = lr_trainingSummary.rootMeanSquaredError\nlr_mae = lr_trainingSummary.meanAbsoluteError\n\n#Log Metric to Comet Experiment\nexperiment.log_metric(\"LR_r2\", lr_r2, step=0)\nexperiment.log_metric(\"LR_MSE\", lr_mse, step=0)\nexperiment.log_metric(\"LR_RMSE\", lr_rmse, step=0)\nexperiment.log_metric(\"LR_MAE\", lr_mae, step=0)\n\n#Display Metrics\nprint(\"RMSE: %f\" % lr_r2)\nprint(\"MSE = %s\" % lr_mse)\nprint(\"r2: %f\" % lr_rmse)\nprint(\"MAE = %s\" % lr_rmse)<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"923e\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The model logs the following metrics:<\/p>\n<ul class=\"\">\n<li id=\"16ad\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">RMSE: 0.892751<\/li>\n<li id=\"8f40\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">MSE = 1707708.8<\/li>\n<li id=\"d66e\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">r2: 1306.793,<\/li>\n<li id=\"abff\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">MAE = 1306.7933613787445<\/li>\n<\/ul>\n<h2 id=\"8984\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Decision Tree<\/strong><\/h2>\n<p id=\"2549\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">With decision tree regression, we need to import both the DecisionTreeRegressor and RegressionEvaluator. We will be using the last one to get the metrics since the .summary method is used only with linear regression.<\/p>\n<p id=\"6162\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">To simplify the code we used the get_SparkMetric method detailed earlier in the article. The method is used to gather the metric from the model which is then logged into the experiment run.<\/p>\n<p id=\"48fe\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">An example of this is shown below:<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahg rp l\">\n<pre>from pyspark.ml.regression import DecisionTreeRegressor\nfrom pyspark.ml.evaluation import RegressionEvaluator #This is needed to run the get_SparkMetric function.\n\n#create and fit model\ndt = DecisionTreeRegressor(featuresCol ='features', labelCol = 'price')\ndt_model = dt.fit(train_df)\ndt_predictions = dt_model.transform(test_df)\n\n#create metrics\ndt_r2 = get_SparkMetric(\"price\", \"prediction\", \"r2\", dt_predictions)\ndt_rmse = get_SparkMetric(\"price\", \"prediction\", \"rmse\", dt_predictions)\ndt_mae = get_SparkMetric(\"price\", \"prediction\", \"mae\", dt_predictions)\ndt_mse = get_SparkMetric(\"price\", \"prediction\", \"mse\", dt_predictions)\n\n#log metrics\nexperiment.log_metric(\"dt_r2\", dt_r2, step=0)\nexperiment.log_metric(\"dt_rmse\", dt_rmse, step=0)\nexperiment.log_metric(\"dt_mae\", dt_mae, step=0)\nexperiment.log_metric(\"dt_mse\", dt_mse, step=0)<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"d293\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The model returns the following metrics:<\/p>\n<ul class=\"\">\n<li id=\"acb6\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">RMSE: 0.892751<\/li>\n<li id=\"8b6e\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">MSE = 1707708.8<\/li>\n<li id=\"38c5\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">r2: 1306.793,<\/li>\n<li id=\"6df1\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">MAE = 1306.7933613787445<\/li>\n<\/ul>\n<h2 id=\"1c52\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Gradient Boosted Regressor<\/strong><\/h2>\n<p id=\"6e4c\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Gradient boosted regressor\u2019s metrics are recorded in the same way as the decision tree model, using the get_SparkMetric.<\/p>\n<figure class=\"mk ml mm mn mo mp\">\n<div class=\"rn iu l ee\">\n<div class=\"ahh rp l\">\n<pre>from pyspark.ml.regression import GBTRegressor\nfrom pyspark.ml.evaluation import RegressionEvaluator #This is needed to run the get_SparkMetric function.\n\n#create and fit model\ngbt = GBTRegressor(featuresCol = 'features', labelCol = 'price', maxIter=10)\ngbt_model = gbt.fit(train_df)\ngbt_predictions = gbt_model.transform(test_df)\n\n#create metrics\ngbt_r2 = get_SparkMetric(\"price\", \"prediction\", \"r2\", gbt_predictions )\ngbt_rmse = get_SparkMetric(\"price\", \"prediction\", \"rmse\", gbt_predictions)\ngbt_mae = get_SparkMetric(\"price\", \"prediction\", \"mae\", gbt_predictions)\ngbt_mse = get_SparkMetric(\"price\", \"prediction\", \"mse\", gbt_predictions\n\n#log metrics\nexperiment.log_metric(\"gbt_r2\", gbt_r2, step=0)\nexperiment.log_metric(\"gbt_rmse\", gbt_rmse, step=0)\nexperiment.log_metric(\"gbt_mae\", gbt_mae, step=0)\nexperiment.log_metric(\"gbt_mse\", gbt_mse, step=0)\n\nprint(\"r2: %f\" % gbt_r2)\nprint(\"MSE = %s\" % gbt_mse)\nprint(\"RMSE: %f\" % gbt_rmse)\nprint(\"MAE = %s\" % gbt_rmse)<\/pre>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"4164\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The model returns the following metrics:<\/p>\n<ul class=\"\">\n<li id=\"20b1\" class=\"nu nv fr be b gp op nx ny gs oq oa ob oc ou oe of og ov oi oj ok ow om on oo ox oy oz bj\" data-selectable-paragraph=\"\">RMSE: 0.892751<\/li>\n<li id=\"6848\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">MSE = 1707708.8<\/li>\n<li id=\"07b4\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">r2: 1306.793,<\/li>\n<li id=\"03e7\" class=\"nu nv fr be b gp pa nx ny gs pb oa ob oc pc oe of og pd oi oj ok pe om on oo ox oy oz bj\" data-selectable-paragraph=\"\">MAE = 1306.7933613787445<\/li>\n<\/ul>\n<p id=\"7ac9\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Once you have logged the metrics to the run, end the experiment by creating a cell with <em class=\"pi\">experiment.end() . <\/em>The end result should look like the picture below:<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<div class=\"qf qg ee qh bg qi\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*4uw5SMZZcNup7oh_z7Z0wg.png\" alt=\"\" width=\"700\" height=\"258\"><\/figure><div class=\"mh mi sl\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*4uw5SMZZcNup7oh_z7Z0wg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*4uw5SMZZcNup7oh_z7Z0wg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*4uw5SMZZcNup7oh_z7Z0wg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*4uw5SMZZcNup7oh_z7Z0wg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*4uw5SMZZcNup7oh_z7Z0wg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*4uw5SMZZcNup7oh_z7Z0wg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*4uw5SMZZcNup7oh_z7Z0wg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*4uw5SMZZcNup7oh_z7Z0wg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Author\u2019s image<\/figcaption>\n<\/figure>\n<p id=\"9932\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">After you have done that, let\u2019s check out the results in Comet. The url displayed at the end of the run is the location of your experiment in your Comet workspace.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"60c7\" class=\"my mz fr be na nb sa gr nd ne sb gu ng nh sc nj nk nl sd nn no np se nr ns nt bj\">Comparing the Results in Comet<\/h2>\n<p id=\"da38\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Let\u2019s check how the experiment logged the data and how the metrics compared to each other. The data saved in each experiment can be used to check for subtle changes in the model, metrics, and even the dataset.<\/p>\n<p id=\"94e4\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Let\u2019s now look inside the Panels tab (i.e., Comet\u2019s concept for data visualization).<\/p>\n<h2 id=\"25bd\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Panels<\/strong><\/h2>\n<p id=\"7209\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">We separated the metrics into four distinct graphs comparing the r2, mean squared error, mean absolute error, and root mean square error. The gradient boost out-performed simple linear regression and regression tree.<\/p>\n<\/div>\n<\/div>\n<div class=\"mp\">\n<div class=\"ab ca\">\n<div class=\"sm sn so sp sq sr ce ss cf st ch bg\">\n<figure class=\"mk ml mm mn mo mp sv sw paragraph-image\">\n<div class=\"qf qg ee qh bg qi\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1000\/1*ht_5IhcT_YkA9jUEeKzhdA.png\" alt=\"\" width=\"1000\" height=\"451\"><\/figure><div class=\"mh mi su\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:2000\/format:webp\/1*ht_5IhcT_YkA9jUEeKzhdA.png 2000w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 1000px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*ht_5IhcT_YkA9jUEeKzhdA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*ht_5IhcT_YkA9jUEeKzhdA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*ht_5IhcT_YkA9jUEeKzhdA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*ht_5IhcT_YkA9jUEeKzhdA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*ht_5IhcT_YkA9jUEeKzhdA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*ht_5IhcT_YkA9jUEeKzhdA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:2000\/1*ht_5IhcT_YkA9jUEeKzhdA.png 2000w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 1000px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Experiment Run Charts Panel \u2014 author\u2019s image<\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h2 id=\"00c4\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Metrics<\/strong><\/h2>\n<p id=\"9fda\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">You can also view the logged metrics in the metrics tab. The metrics tab also includes features that allow you to search by metric name and change the decimal precision.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:550\/1*NQ_68tboD7arhfmWy7jGhw.png\" alt=\"\" width=\"550\" height=\"508\"><\/figure><div class=\"mh mi sx\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*NQ_68tboD7arhfmWy7jGhw.png 1100w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 550px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*NQ_68tboD7arhfmWy7jGhw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*NQ_68tboD7arhfmWy7jGhw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*NQ_68tboD7arhfmWy7jGhw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*NQ_68tboD7arhfmWy7jGhw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*NQ_68tboD7arhfmWy7jGhw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*NQ_68tboD7arhfmWy7jGhw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*NQ_68tboD7arhfmWy7jGhw.png 1100w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 550px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"a6e5\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">From both the charts and metric data, the gradient boosting algorithm performed the best compared to simple linear regression and regression tree.<\/p>\n<h2 id=\"fae7\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Charts<\/strong><\/h2>\n<p id=\"5e54\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">If we want to look at the charts we created from the EDA we need to click the graphics tab of the experiment. The Seaborn and matplotlib figures will be saved in the experiment. We can search for these figures by the name we assigned when we logged the figures.<\/p>\n<\/div>\n<\/div>\n<div class=\"mp\">\n<div class=\"ab ca\">\n<div class=\"sm sn so sp sq sr ce ss cf st ch bg\">\n<figure class=\"mk ml mm mn mo mp sv sw paragraph-image\">\n<div class=\"qf qg ee qh bg qi\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1000\/1*5RhDE_KFO8SBgCF7wdwVRA.png\" alt=\"\" width=\"1000\" height=\"337\"><\/figure><div class=\"mh mi sy\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:2000\/format:webp\/1*5RhDE_KFO8SBgCF7wdwVRA.png 2000w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 1000px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*5RhDE_KFO8SBgCF7wdwVRA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*5RhDE_KFO8SBgCF7wdwVRA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*5RhDE_KFO8SBgCF7wdwVRA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*5RhDE_KFO8SBgCF7wdwVRA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*5RhDE_KFO8SBgCF7wdwVRA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*5RhDE_KFO8SBgCF7wdwVRA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:2000\/1*5RhDE_KFO8SBgCF7wdwVRA.png 2000w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 1000px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Comet\u2019s Figure Log, Author\u2019s image<\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"203e\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Saving these charts is really valuable, especially if you need to create reports or presentations for stakeholders. It\u2019s also quite useful if another teammate is attempting to replicate the results of your EDA on their own computer.<\/p>\n<p id=\"2aea\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Now, let\u2019s check out my favorite Comet feature: Code Logging.<\/p>\n<h2 id=\"5271\" class=\"pn mz fr be na po pp pq nd pr ps pt ng oc pu pv pw og px py pz ok qa qb qc qd bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Code Logging<\/strong><\/h2>\n<p id=\"b3ba\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">The Azure Databricks environment doesn\u2019t integrate with GitHub like a Jupyter Notebook would (sad, I know). However, it does log the code from your Databricks notebook if you set it up in the initial experiment.<\/p>\n<p id=\"00da\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">The code inside the experiments can be accessed from the code tab in the Comet UI.<\/p>\n<figure class=\"mk ml mm mn mo mp mh mi paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mq mr c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:599\/1*YmKDEoBY0RWvhVN3G3C_fQ.png\" alt=\"\" width=\"599\" height=\"655\"><\/figure><div class=\"mh mi sz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1198\/format:webp\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 1198w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 599px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1198\/1*YmKDEoBY0RWvhVN3G3C_fQ.png 1198w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 599px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"ms mt mu mh mi mv mw be b bf z dw\" data-selectable-paragraph=\"\">Code Saved in Experiment.<\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"5ccd\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">All the code between the experiment() method and the experiment.end() method will be logged. Any code that is outside these blocks will not be logged. The tab also allows you to export your code via the download button in the upper right corner.<\/p>\n<h2 id=\"c32e\" class=\"my mz fr be na nb nc gr nd ne nf gu ng nh ni nj nk nl nm nn no np nq nr ns nt bj\">Conclusion<\/h2>\n<p id=\"1970\" class=\"pw-post-body-paragraph nu nv fr be b gp nw nx ny gs nz oa ob oc od oe of og oh oi oj ok ol om on oo fk bj\" data-selectable-paragraph=\"\">Well, that\u2019s it! You\u2019ve successfully logged your first experiment in Comet using Azure Databricks. Databricks is a different environment combined with Comet, and can really speed up your experiment tracking.<\/p>\n<p id=\"bf39\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Links to my Github repository and Comet Experiment repository are below:<\/p>\n<ul>\n<li class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\"><a href=\"https:\/\/github.com\/mattblasa\/azure-comet?source=post_page-----4ec97703a2fe--------------------------------\">GitHub Repository<\/a><\/li>\n<li class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\"><a href=\"https:\/\/www.comet.com\/mattblasa\/azure-and-comet\/view\/7lU7JYsW7B7BDHK3Tpgo3T19M\/panels?source=post_page-----4ec97703a2fe--------------------------------\"><span style=\"font-family: var(--wpex-body-font-family, var(--wpex-font-sans)); font-size: var(--wpex-body-font-size, 13px);\">Comet Experiment Log<\/span><\/a><\/li>\n<\/ul>\n<p id=\"f532\" class=\"pw-post-body-paragraph nu nv fr be b gp op nx ny gs oq oa ob oc or oe of og os oi oj ok ot om on oo fk bj\" data-selectable-paragraph=\"\">Thank you for reading! Connect with me on <a class=\"af mx\" href=\"https:\/\/www.linkedin.com\/in\/mblasa\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">LinkedIn<\/a> for more on data science topics.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Diamond, Deposit Photos Introduction Recently I made the switch to actively tracking my machine learning experiments. First with ML Flow and then with Comet. Tracking experiments is becoming more common in MLOps. As time goes on, data scientists and machine learning engineers have realized the importance of model versioning. This has gone beyond saving metrics [&hellip;]<\/p>\n","protected":false},"author":102,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[23,9],"tags":[],"coauthors":[200],"class_list":["post-8033","post","type-post","status-publish","format-standard","hentry","category-integrations","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Integrating Comet and Azure Databricks<\/title>\n<meta name=\"description\" content=\"Learn how to use Comet\u2019s experiment tracking within an Azure Databricks environment in this article. Read more.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Integrating Comet and Azure Databricks\" \/>\n<meta property=\"og:description\" content=\"Learn how to use Comet\u2019s experiment tracking within an Azure Databricks environment in this article. Read more.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-25T22:54:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:05:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg\" \/>\n<meta name=\"author\" content=\"Matt Blasa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Matt Blasa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Integrating Comet and Azure Databricks","description":"Learn how to use Comet\u2019s experiment tracking within an Azure Databricks environment in this article. Read more.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks","og_locale":"en_US","og_type":"article","og_title":"Integrating Comet and Azure Databricks","og_description":"Learn how to use Comet\u2019s experiment tracking within an Azure Databricks environment in this article. Read more.","og_url":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-10-25T22:54:09+00:00","article_modified_time":"2025-04-24T17:05:11+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg","type":"","width":"","height":""}],"author":"Matt Blasa","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Matt Blasa","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks\/"},"author":{"name":"Matt Blasa","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/b689e5bf064facf03364f36108b53a7f"},"headline":"Integrating Comet and Azure Databricks","datePublished":"2023-10-25T22:54:09+00:00","dateModified":"2025-04-24T17:05:11+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks\/"},"wordCount":2384,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg","articleSection":["Integrations","Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks\/","url":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks","name":"Integrating Comet and Azure Databricks","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg","datePublished":"2023-10-25T22:54:09+00:00","dateModified":"2025-04-24T17:05:11+00:00","description":"Learn how to use Comet\u2019s experiment tracking within an Azure Databricks environment in this article. Read more.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*0gBid337Sj4L84_eEhHncg.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/integrating-comet-and-azure-databricks#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Integrating Comet and Azure Databricks"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/b689e5bf064facf03364f36108b53a7f","name":"Matt Blasa","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/ef901c28c6d95d1994d584481e592d11","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/10\/1691951148078-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/10\/1691951148078-96x96.jpg","caption":"Matt Blasa"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/blasa-matthewyahoo-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8033","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/102"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8033"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8033\/revisions"}],"predecessor-version":[{"id":15484,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8033\/revisions\/15484"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8033"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8033"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8033"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8033"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}