{"id":7439,"date":"2023-09-12T08:37:38","date_gmt":"2023-09-12T16:37:38","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7439"},"modified":"2025-04-24T17:14:11","modified_gmt":"2025-04-24T17:14:11","slug":"building-a-text-classifier-app-with-hugging-face-bert-and-comet","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/","title":{"rendered":"Building a Text Classifier App with Hugging Face, BERT, and Comet"},"content":{"rendered":"\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"af gt\" href=\"https:\/\/www.freepik.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Image by Freepik<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"27c6\">LLMs such as GPT, BERT, and Llama 2 are a game changer in AI. You can build AI tools like ChatGPT and Bard using these models. But you need to fine-tune these language models when performing your deep learning projects. This is where AI platforms come in.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"6bc1\">Today, I\u2019ll show you how to build an end-to-end text classification project. Here are the topics we\u2019ll cover in this article:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fine-tuning the BERT model with the Transformers library for text classification.<\/li>\n\n\n\n<li>Building a web app with the Gradio.<\/li>\n\n\n\n<li>Monitoring this app with Comet.<\/li>\n<\/ul>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"db3f\">After creating the app, it will look like the one below:<\/p>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*a9m9XX7Z5Buoe_qZLp-EqA.gif\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"af gt\" href=\"https:\/\/huggingface.co\/spaces\/Tirendaz\/Text-Classification\" target=\"_blank\" rel=\"noopener ugc nofollow\">Gradio App by Author<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"e5b0\">You can leverage <a class=\"af gt\" href=\"https:\/\/www.kaggle.com\/code\/tirendazacademy\/text-classification-with-transformers-and-comet\" target=\"_blank\" rel=\"noopener ugc nofollow\">this Kaggle notebook<\/a> to follow the code with me and look at <a class=\"af gt\" href=\"https:\/\/github.com\/TirendazAcademy\/Bert-Text-Classification-Gradio-App\" target=\"_blank\" rel=\"noopener ugc nofollow\">this repo<\/a> to review the project files.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"947d\">Let\u2019s start by installing the necessary platforms.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"bb58\">Step 1. Installing Required Libraries<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"2452\">The first thing we\u2019re going to do is install the necessary libraries. This is very easy to do with the pip package manager, as shown below:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"5cc5\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\">!pip install -q comet_ml transformers datasets gradio<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"318c\">After that, let\u2019s go ahead and initialize the platforms we will use.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"3fe3\">Step 2. Initialize Comet and Hugging Face<\/h1>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*CZz5MdZrXu5sSXs-uAwFpw.jpeg\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"af gt\" href=\"https:\/\/www.comet.com\/docs\/v2\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet is a platform to track, monitor, and optimize your models throughout the entire ML lifecycle.<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"2d18\">To track our hyperparameters and monitor our app, we\u2019ll use Comet. To do this, we first need to initialize it, as shown below:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"c8c1\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> comet_ml\n\n<span class=\"hljs-comment\"># Initializing the project<\/span>\ncomet_ml.login(project_name=<span class=\"hljs-string\">\"text-classification-with-transformers\"<\/span>)<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"7778\">After running this snippet, you need to enter your Comet API key. Go to <a class=\"af gt\" href=\"\/signup\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a> and create a free account to get your API key.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"6224\">Plus, after training our LLM model, we\u2019ll push it to Hugging Face Spaces, which allows you to host your ML demo apps on your profile.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"68a7\">Logging in to Hugging Face with the <code class=\"ec abi abj abk zz b\">notebook_login<\/code> method is very easy. To do this, use your Hugging Face API key. If you don\u2019t have your API key, you can get it for free <a class=\"af gt\" href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"d197\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> huggingface_hub <span class=\"hljs-keyword\">import<\/span> notebook_login\n\n<span class=\"hljs-comment\"># Logining Hugging Face<\/span>\nnotebook_login()<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"f2e3\">Nice, we\u2019ve initialized the platform we\u2019ll use. Let\u2019s move on to loading the dataset.<\/p>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*CxK7-2h4stanujYhUWvYkQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">A classic pipeline for training transformer models (Image by Author)<\/figcaption><\/figure>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"8dee\">Step 3. Load Data<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"7ec6\">The dataset we will use is a movie review dataset called <a class=\"af gt\" href=\"https:\/\/huggingface.co\/datasets\/rotten_tomatoes\" target=\"_blank\" rel=\"noopener ugc nofollow\">rotten tomatoes<\/a>. Fortunately, this dataset is available in the datasets library. All we need to do is load this dataset with the <code class=\"ec abi abj abk zz b\">load_dataset<\/code> method. Let\u2019s do this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"dd9d\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> datasets <span class=\"hljs-keyword\">import<\/span> load_dataset\n\n<span class=\"hljs-comment\"># Loading the dataset<\/span>\nraw_datasets = load_dataset(<span class=\"hljs-string\">\"rotten_tomatoes\"<\/span>)<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"cd8f\">Great, our data is loaded. Let\u2019s take a look at this data:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"bace\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\">raw_datasets\n\n<span class=\"hljs-comment\"># Output:<\/span>\n<span class=\"hljs-string\">\"\"\"\"\nDatasetDict({\n    train: Dataset({\n        features: ['text', 'label'],\n        num_rows: 8530\n    })\n    validation: Dataset({\n        features: ['text', 'label'],\n        num_rows: 1066\n    })\n    test: Dataset({\n        features: ['text', 'label'],\n        num_rows: 1066\n    })\n})\n\"\"\"<\/span><\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"c17a\">As you can see, data is very similar to a Python dictionary, where each key corresponds to a different dataset. We can utilize the usual dictionary syntax to look at a single split:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"05e1\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># Looking at the first sample of the training set<\/span>\nraw_datasets[<span class=\"hljs-string\">\"train\"<\/span>][<span class=\"hljs-number\">0<\/span>]\n\n<span class=\"hljs-comment\"># Output:<\/span>\n<span class=\"hljs-string\">\"\"\"\n{'text': 'the rock is destined to be the 21st century\\'s new \" conan \" and that he\\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',\n 'label': 1}\n\"\"\"<\/span>\n<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"0eb4\">Awesome, we\u2019ve seen the first sample of the training dataset. Now, to gain more insight about the data, let\u2019s convert it to Pandas DataFrame<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"5751\">Step 4. Understand Data<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"2c69\">Understanding data is one of the most important stages of the data analysis lifecycle. To do this, there is no doubt that Pandas are king.<\/p>\n\n\n\n<h2 class=\"wp-block-heading abl za so be zb lv abm lw lz ma abn mb me mf abo mg mj mk abp ml mo mp abq mq mt abr bj\" id=\"bf27\">From Datasets to Pandas DataFrames<\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"c2c5\">First, let me convert the data into Pandas DataFrame with the <code class=\"ec abi abj abk zz b\">set_format<\/code>method as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"1997\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n\n<span class=\"hljs-comment\"># Converting the dataset into Pandas dataframe<\/span>\nraw_datasets.set_format(<span class=\"hljs-built_in\">type<\/span>=<span class=\"hljs-string\">\"pandas\"<\/span>)\ndf = raw_datasets[<span class=\"hljs-string\">\"train\"<\/span>][:]\ndf.head()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:542\/1*TcE6VqxzqP1gZsu93W1T2g.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"1e93\">As you can see, the data contains only two columns: text and label. Let\u2019s move on to exploring the class distribution.<\/p>\n\n\n\n<h2 class=\"wp-block-heading abl za so be zb lv abm lw lz ma abn mb me mf abo mg mj mk abp ml mo mp abq mq mt abr bj\" id=\"3855\">Looking at the Label Distribution<\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"7e41\">The simplest way to understand data is to visualize it. Let\u2019s draw a bar chart with Matplotlib to look at the label distribution.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"5d95\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\n\n<span class=\"hljs-comment\"># Visualizing the frequency of classes<\/span>\ndf[<span class=\"hljs-string\">\"label_name\"<\/span>].value_counts(ascending=<span class=\"hljs-literal\">True<\/span>).plot.barh()\nplt.title(<span class=\"hljs-string\">\"Frequency of Classes\"<\/span>)\nplt.show()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*BMzTx7FxmHP8HrgGKj6jJw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The Label Distribution<\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"2334\">As you can see, the distribution of labels is balanced.<\/p>\n\n\n\n<h2 class=\"wp-block-heading abl za so be zb lv abm lw lz ma abn mb me mf abo mg mj mk abp ml mo mp abq mq mt abr bj\" id=\"56fc\">How Long Are Our Texts?<\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"9ac0\">The model we\u2019ll use is DistilBERT. Like other transformer models, this model has a maximum input text length. This number is 512.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"a2a1\">Let\u2019s take a look at the distribution of words per review:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"584b\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># Visualizing words per review<\/span>\ndf[<span class=\"hljs-string\">\"Words Per Review\"<\/span>] = df[<span class=\"hljs-string\">\"text\"<\/span>].<span class=\"hljs-built_in\">str<\/span>.split().apply(<span class=\"hljs-built_in\">len<\/span>)\ndf.boxplot(<span class=\"hljs-string\">\"Words Per Review\"<\/span>, by=<span class=\"hljs-string\">\"label_name\"<\/span>, grid=<span class=\"hljs-literal\">False<\/span>, showfliers=<span class=\"hljs-literal\">False<\/span>,\n           color=<span class=\"hljs-string\">\"black\"<\/span>)\nplt.suptitle(<span class=\"hljs-string\">\"\"<\/span>)\nplt.xlabel(<span class=\"hljs-string\">\"\"<\/span>)\nplt.show()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*RUJEkzSm3p5PvJ63wHuF2g.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Words Per Review<\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"07d1\">As you can see, most reviews are around 15 words long, and the longest reviews are well below DistilBERT\u2019s maximum sequence size.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"9c7d\">Nice, we examined our data. Since we no longer need the DataFrame format, let\u2019s reset the format of our dataset:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"2409\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># Reseting the dataset format<\/span>\nraw_datasets.reset_format()<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"8058\">Now, we\u2019re ready to preprocess data. Let\u2019s do this.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"b022\">Step 5. Data Preprocessing<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"9b1a\">Deep learning models don\u2019t like raw strings as input. Instead, they want the text to be encoded as numerical representations. This is where tokenization comes in. Tokenization is a way of breaking sentences into smaller units called tokens.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"4ad2\">We are lucky that Transformers contains an <a class=\"af gt\" href=\"https:\/\/huggingface.co\/docs\/transformers\/v4.32.0\/en\/model_doc\/auto#transformers.AutoTokenizer.from_pretrained\" target=\"_blank\" rel=\"noopener ugc nofollow\">AutoTokenizer<\/a> class. This class helps you quickly load the tokenizer associated with a pre-trained model. All you need to do is call your model\u2019s <code class=\"ec abi abj abk zz b\">from_pretrained<\/code> method. In our case, let\u2019s start by loading the tokenizer for DistilBERT as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"d817\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> AutoTokenizer\n\n<span class=\"hljs-comment\"># Loading the DistilBERT tokenizer <\/span>\ncheckpoint = <span class=\"hljs-string\">\"distilbert-base-uncased\"<\/span>\ntokenizer = AutoTokenizer.from_pretrained(checkpoint)<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"1cbd\">Okay, our tokenizer is ready to apply the whole corpus. Let\u2019s create a preprocessing function and pass the <code class=\"ec abi abj abk zz b\">truncation<\/code> parameter to it. This parameter will truncate the texts to the model\u2019s maximum input size.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"6359\">After creating the function, let\u2019s tokenize our datasets using the <code class=\"ec abi abj abk zz b\">map<\/code> method with the <code class=\"ec abi abj abk zz b\">batched<\/code> parameter. This parameter speeds up the function by simultaneously processing multiple dataset elements.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"1d0c\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># Creating a function for tokenization<\/span>\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title.function\">tokenize_function<\/span>(<span class=\"hljs-params\">examples<\/span>):\n    <span class=\"hljs-keyword\">return<\/span> tokenizer(examples[<span class=\"hljs-string\">\"text\"<\/span>], truncation=<span class=\"hljs-literal\">True<\/span>)\n\n<span class=\"hljs-comment\"># Applying the function to the entire dataset<\/span>\ntokenized_datasets = raw_datasets.<span class=\"hljs-built_in\">map<\/span>(tokenize_function, batched=<span class=\"hljs-literal\">True<\/span>)<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"4f79\">Nice, we tokenized our datasets. It\u2019s time to create a batch of examples with <code class=\"ec abi abj abk zz b\">DataCollatorWithPadding.<\/code>This method will dynamically pad the sentences received instead of padding the entire dataset to the maximum length.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"2c73\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> DataCollatorWithPadding\n\n<span class=\"hljs-comment\"># Padding<\/span>\ndata_collator = DataCollatorWithPadding(tokenizer=tokenizer)<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*r44u5uehgwtUXLO6OGEU7g.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"af gt\" href=\"https:\/\/github.com\/nlp-with-transformers\/notebooks\/blob\/main\/02_classification.ipynb\" target=\"_blank\" rel=\"noopener ugc nofollow\">Tokenization process for DistilBERT<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"a048\">Awesome, we preprocessed the datasets. Let\u2019s go ahead and create the evaluation function.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"7073\">Step 6. Evaluation Function<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"8b2e\">As you know, metrics help us evaluate the performance of the model. For this analysis, we\u2019ll compute the accuracy, precision, recall, and f1 metrics.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"001e\">Let\u2019s create a function named <code class=\"ec abi abj abk zz b\">compute_metrics<\/code> to track metrics during training. To do this, we will leverage the Scikit-Learn and Comet libraries.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"9498\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> accuracy_score, precision_recall_fscore_support\n\n<span class=\"hljs-comment\"># Indexing to example function<\/span>\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title.function\">get_example<\/span>(<span class=\"hljs-params\">index<\/span>):\n    <span class=\"hljs-keyword\">return<\/span> tokenized_datasets[<span class=\"hljs-string\">\"test\"<\/span>][index][<span class=\"hljs-string\">\"text\"<\/span>]\n\n<span class=\"hljs-comment\"># Creating a function to compute metrics<\/span>\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title.function\">compute_metrics<\/span>(<span class=\"hljs-params\">pred<\/span>):\n    experiment = comet_ml.get_global_experiment()\n\n    labels = pred.label_ids\n    preds = pred.predictions.argmax(-<span class=\"hljs-number\">1<\/span>)\n    precision, recall, f1, _ = precision_recall_fscore_support(\n        labels, preds, average=<span class=\"hljs-string\">\"macro\"<\/span>\n    )\n    acc = accuracy_score(labels, preds)\n\n    <span class=\"hljs-keyword\">if<\/span> experiment:\n        epoch = <span class=\"hljs-built_in\">int<\/span>(experiment.curr_epoch) <span class=\"hljs-keyword\">if<\/span> experiment.curr_epoch <span class=\"hljs-keyword\">is<\/span> <span class=\"hljs-keyword\">not<\/span> <span class=\"hljs-literal\">None<\/span> <span class=\"hljs-keyword\">else<\/span> <span class=\"hljs-number\">0<\/span>\n        experiment.set_epoch(epoch)\n        experiment.log_confusion_matrix(\n            y_true=labels,\n            y_predicted=preds,\n            file_name=<span class=\"hljs-string\">f\"confusion-matrix-epoch-<span class=\"hljs-subst\">{epoch}<\/span>.json\"<\/span>,\n            labels=[<span class=\"hljs-string\">\"negative\"<\/span>, <span class=\"hljs-string\">\"positive\"<\/span>],\n            index_to_example_function=get_example,\n        )\n\n    <span class=\"hljs-keyword\">return<\/span> {<span class=\"hljs-string\">\"accuracy\"<\/span>: acc, <span class=\"hljs-string\">\"f1\"<\/span>: f1, <span class=\"hljs-string\">\"precision\"<\/span>: precision, <span class=\"hljs-string\">\"recall\"<\/span>: recall}<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"2a6a\">Great, we defined the performance metrics. We\u2019ll use this function in the model training step. Let\u2019s move on to building the model.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"2590\">Step 7. Transformer Model<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"f91f\">Trust me, it\u2019s straightforward to fit a model using Transformers.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"114e\">First, instantiate your model using the <code class=\"ec abi abj abk zz b\">AutoModelForSequenceClassification<\/code>class and then fine-tune this model according to your data with the <code class=\"ec abi abj abk zz b\">num_labels<\/code> parameter. That\u2019s simple, right?<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"7003\">In our case, we\u2019ll pass two to this parameter because the labels of our dataset are two classes. Also, let\u2019s use <code class=\"ec abi abj abk zz b\">id2label<\/code> and <code class=\"ec abi abj abk zz b\">label2id<\/code> to match the expected IDs to their labels.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"c270\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> AutoModelForSequenceClassification\n\n<span class=\"hljs-comment\"># Mapping ids to labels <\/span>\nid2label = {<span class=\"hljs-number\">0<\/span>: <span class=\"hljs-string\">\"NEGATIVE\"<\/span>, <span class=\"hljs-number\">1<\/span>: <span class=\"hljs-string\">\"POSITIVE\"<\/span>}\nlabel2id = {<span class=\"hljs-string\">\"NEGATIVE\"<\/span>: <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-string\">\"POSITIVE\"<\/span>: <span class=\"hljs-number\">1<\/span>}\n\n<span class=\"hljs-comment\"># Building the model<\/span>\nmodel = AutoModelForSequenceClassification.from_pretrained(\n              checkpoint, num_labels=<span class=\"hljs-number\">2<\/span>, id2label=id2label, label2id=label2id)<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"e98a\">Cool, we loaded our pre-trained model. Let\u2019s go ahead and start training this model.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"21e6\">Step 8. Run Training<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"b3c8\">It\u2019s time to train the model. First, let\u2019s define the training parameters using the <code class=\"ec abi abj abk zz b\">TrainingArguments<\/code> class. In this step, we will set <code class=\"ec abi abj abk zz b\">push_to_hub=True<\/code> to push this model to our Hugging Face Hub and <code class=\"ec abi abj abk zz b\">report_to=[\"comet_ml\"]<\/code>to monitor our hyperparameters in the Comet Dashboard.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"f3b7\">After that, what we\u2019re going to do is instantiate our model and fine-tune it with the <code class=\"ec abi abj abk zz b\">Trainer.<\/code><\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"11f2\">Lastly, we\u2019re going to call the <code class=\"ec abi abj abk zz b\">train<\/code> method to start training. That\u2019s it.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"c0c9\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> TrainingArguments, Trainer\n\n<span class=\"hljs-comment\"># Setting Comet enviroment variables<\/span>\n%env COMET_MODE=ONLINE\n%env COMET_LOG_ASSETS=TRUE\n\n<span class=\"hljs-comment\"># Setting training arguments<\/span>\ntraining_args = TrainingArguments(\n    output_dir=<span class=\"hljs-string\">\"my_distilbert_model\"<\/span>,\n    learning_rate=<span class=\"hljs-number\">2e-5<\/span>,\n    per_device_train_batch_size=<span class=\"hljs-number\">16<\/span>,\n    per_device_eval_batch_size=<span class=\"hljs-number\">16<\/span>,\n    num_train_epochs=<span class=\"hljs-number\">3<\/span>,\n    weight_decay=<span class=\"hljs-number\">0.01<\/span>,\n    evaluation_strategy=<span class=\"hljs-string\">\"epoch\"<\/span>,\n    save_strategy=<span class=\"hljs-string\">\"epoch\"<\/span>,\n    load_best_model_at_end=<span class=\"hljs-literal\">True<\/span>,\n    push_to_hub=<span class=\"hljs-literal\">True<\/span>,\n    report_to=[<span class=\"hljs-string\">\"comet_ml\"<\/span>],\n)\n\n<span class=\"hljs-comment\"># Creating a trainer object<\/span>\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=tokenized_datasets[<span class=\"hljs-string\">\"train\"<\/span>],\n    eval_dataset=tokenized_datasets[<span class=\"hljs-string\">\"test\"<\/span>],\n    compute_metrics=compute_metrics,\n    data_collator=data_collator,\n)\n\n<span class=\"hljs-comment\"># Training the model<\/span>\ntrainer.train()<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*ZkV4kKbE8ZHNYX_v5Bo4Bw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"05ea\">Voil\u00e0, our model was trained, and metrics were calculated for each epoch. As you can see, the performance of our model is not bad.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"bfc1\">It\u2019s time to push our model to the HUB to share with everyone, as shown below:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"7ff3\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># Pushing the model<\/span>\ntrainer.push_to_hub()<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"e7f6\">Our model looks like this in my Hub:<\/p>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*y78ObJK2maJr5-nkSDTXdA.gif\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><a class=\"af gt\" href=\"https:\/\/huggingface.co\/Tirendaz\/my_distilbert_model\" target=\"_blank\" rel=\"noopener ugc nofollow\">Our trained model in my HUB<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"3c68\">Let\u2019s take a look at how to predict the label of a text that the model has not seen before.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"4f08\">Step 9. Inference<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"0f0d\">We now have a model on the Hugging Face Hub. It\u2019s time to make a prediction using this model. The easiest way to do this is to use a pipeline. All we have to do is pass our model to it. Let\u2019s do this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"1b6c\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> pipeline\n\n<span class=\"hljs-comment\"># Creating a text<\/span>\ntext = <span class=\"hljs-string\">\"This is a great movie. It may be my favourite.\"<\/span>\n\n<span class=\"hljs-comment\"># Predicting the label<\/span>\nclassifier = pipeline(<span class=\"hljs-string\">\"sentiment-analysis\"<\/span>,\n                       model=<span class=\"hljs-string\">\"Tirendaz\/my_distilbert_model\"<\/span>)\nclassifier(text)\n\n<span class=\"hljs-comment\"># Output:<\/span>\n<span class=\"hljs-comment\"># [{'label': 'POSITIVE', 'score': 0.971620500087738}]<\/span><\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"9b06\">As you can see, the prediction was made, and the score for this prediction was calculated. Our model correctly predicted the label of the text.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"0b7a\">Let\u2019s move on to deploying our model with Gradio.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"92be\">Step 10. Deploy<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"0f33\">In the final step, we\u2019ll walk you through how to share our model with the community. <a class=\"af gt\" href=\"https:\/\/www.gradio.app\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Gradio<\/a> is king when it comes to sharing machine learning models.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"ae64\">It\u2019s important to note that you can display your app on Hugging Face Hub. Alternatively, you can leverage the Comet dashboard to share with your friends. All you need to do is to utilize the <code class=\"ec abi abj abk zz b\">comet_ml.Experiment<\/code> object.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span id=\"51ac\" class=\"abc za so zz b bf abd abe l abf abg\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> gradio <span class=\"hljs-keyword\">as<\/span> gr\n<span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> pipeline\n\n<span class=\"hljs-comment\"># Creating pipeline<\/span>\nclassifier = pipeline(<span class=\"hljs-string\">\"sentiment-analysis\"<\/span>,\n                      model=<span class=\"hljs-string\">\"Tirendaz\/my_distilbert_model\"<\/span>)\n\n<span class=\"hljs-comment\"># Creating a function for text classification<\/span>\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title.function\">text_classification<\/span>(<span class=\"hljs-params\">text<\/span>):\n    result= classifier(text)\n    sentiment_label = result[<span class=\"hljs-number\">0<\/span>][<span class=\"hljs-string\">'label'<\/span>]\n    sentiment_score = result[<span class=\"hljs-number\">0<\/span>][<span class=\"hljs-string\">'score'<\/span>]\n    formatted_output = <span class=\"hljs-string\">f\"This sentiment is <span class=\"hljs-subst\">{sentiment_label}<\/span> with the probability <span class=\"hljs-subst\">{sentiment_score*<span class=\"hljs-number\">100<\/span>:<span class=\"hljs-number\">.2<\/span>f}<\/span>%\"<\/span>\n    <span class=\"hljs-keyword\">return<\/span> formatted_output\n\n<span class=\"hljs-comment\"># Getting examples <\/span>\nexamples=[<span class=\"hljs-string\">\"This is wonderful movie!\"<\/span>, <span class=\"hljs-string\">\"The movie was really bad; I didn't like it.\"<\/span>]\n\n<span class=\"hljs-comment\"># Building a Gradio interface<\/span>\nio = gr.Interface(fn=text_classification,\n                         inputs= gr.Textbox(lines=<span class=\"hljs-number\">2<\/span>, label=<span class=\"hljs-string\">\"Text\"<\/span>, placeholder=<span class=\"hljs-string\">\"Enter title here...\"<\/span>),\n                         outputs=gr.Textbox(lines=<span class=\"hljs-number\">2<\/span>, label=<span class=\"hljs-string\">\"Text Classification Result\"<\/span>),\n                         title=<span class=\"hljs-string\">\"Text Classification\"<\/span>,\n                         description=<span class=\"hljs-string\">\"Enter a text and see the text classification result!\"<\/span>,\n                         examples=examples)\n\nio.launch(inline=<span class=\"hljs-literal\">False<\/span>, share=<span class=\"hljs-literal\">True<\/span>)\n\n<span class=\"hljs-comment\"># Logging the app to the Comet Dashboard<\/span>\nexperiment = comet_ml.Experiment()\nexperiment.add_tag(<span class=\"hljs-string\">\"text-classifier\"<\/span>)\n\n<span class=\"hljs-comment\"># Integrating Comet<\/span>\nio.integrate(comet_ml=experiment)<\/span><\/pre>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"d9c0\">Great, we\u2019ve built our web app. It\u2019ll look like this in the Comet dashboard:<\/p>\n\n\n\n<figure class=\"wp-block-image xe xf xg xh xi xj le lf paragraph-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*_ejA0p7ET52qBoODyfXTtQ.gif\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Gradio App In My Comet Dashboard<\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"5405\">As you can see, we logged our Gradio app to Comet. We can now interact with it using the Gradio Custom Panel as above.<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"ea86\">Wrap-Up<\/h1>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc zt xy xz tf zu yb yc mf zv ye yf mk zw yh yi mp zx yk yl ym em bj wp-block-paragraph\" id=\"f2d3\">Congratulations, you now know how to build a BERT-based text classification app to classify the labels of texts. As you can see, this has become very easy with the recently developed platforms.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"1796\">In this article, we first fine-tuned a BERT model with Transformers, built a Gradio app using this model, and then showcased it in the Comet dashboard.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"7227\">That\u2019s it. Thanks for reading. Let\u2019s connect <a class=\"af gt\" href=\"http:\/\/youtube.com\/tirendazacademy\" target=\"_blank\" rel=\"noopener ugc nofollow\">YouTube<\/a> | <a class=\"af gt\" href=\"http:\/\/twitter.com\/tirendazacademy\" target=\"_blank\" rel=\"noopener ugc nofollow\">Twitter<\/a> | <a class=\"af gt\" href=\"https:\/\/www.linkedin.com\/in\/tirendaz-academy\" target=\"_blank\" rel=\"noopener ugc nofollow\">LinkedIn<\/a><\/p>\n\n\n\n<div class=\"abx aby abz aca acb acc\">\n<div class=\"km ab fi\">\n<div class=\"acd ab ef cm ck ace\">\n<ul>\n<li class=\"be fs fj z ft fu fv fw fx fy fz fo bj\"><a href=\"https:\/\/www.comet.com\/site\/blog\/7-steps-to-become-a-machine-learning-engineer\/\">7 Steps to Become a Machine Learning Engineer<\/a><\/li>\n<li class=\"be fs fj z ft fu fv fw fx fy fz fo bj\"><a href=\"https:\/\/www.comet.com\/site\/blog\/end-to-end-deep-learning-project-with-pytorch-comet-ml\/\">End-to-End Deep Learning Project with PyTorch &amp; Comet<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p class=\"pw-post-body-paragraph xv xw so be b tc xx xy xz tf ya yb yc mf yd ye yf mk yg yh yi mp yj yk yl ym em bj wp-block-paragraph\" id=\"9cc6\">If you enjoyed this article, please don\u2019t forget to press the clap \ud83d\udc4f button below a few times \ud83d\udc47<\/p>\n\n\n\n<h1 class=\"wp-block-heading yz za so be zb zc zd te lz ze zf th me zg zh zi zj zk zl zm zn zo zp zq zr zs bj\" id=\"89be\">Resources:<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a class=\"af gt\" href=\"https:\/\/huggingface.co\/docs\/transformers\/tasks\/sequence_classification\" target=\"_blank\" rel=\"noopener ugc nofollow\">Text Classification with Hugging Face<\/a><\/li>\n\n\n\n<li><a class=\"af gt\" href=\"https:\/\/www.comet.com\/docs\/v2\/integrations\/ml-frameworks\/huggingface\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Integrate with Hugging Face Transformers<\/a><\/li>\n\n\n\n<li><a class=\"af gt\" href=\"https:\/\/www.oreilly.com\/library\/view\/natural-language-processing\/9781098136789\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Natural Language Processing with Transformers<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>LLMs such as GPT, BERT, and Llama 2 are a game changer in AI. You can build AI tools like ChatGPT and Bard using these models. But you need to fine-tune these language models when performing your deep learning projects. This is where AI platforms come in. Today, I\u2019ll show you how to build an [&hellip;]<\/p>\n","protected":false},"author":70,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6,7],"tags":[],"coauthors":[168],"class_list":["post-7439","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building a Text Classifier App with Hugging Face, BERT, and Comet - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Text Classifier App with Hugging Face, BERT, and Comet\" \/>\n<meta property=\"og:description\" content=\"LLMs such as GPT, BERT, and Llama 2 are a game changer in AI. You can build AI tools like ChatGPT and Bard using these models. But you need to fine-tune these language models when performing your deep learning projects. This is where AI platforms come in. Today, I\u2019ll show you how to build an [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-09-12T16:37:38+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:14:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg\" \/>\n<meta name=\"author\" content=\"Tirendaz AI\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tirendaz AI\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Building a Text Classifier App with Hugging Face, BERT, and Comet - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/","og_locale":"en_US","og_type":"article","og_title":"Building a Text Classifier App with Hugging Face, BERT, and Comet","og_description":"LLMs such as GPT, BERT, and Llama 2 are a game changer in AI. You can build AI tools like ChatGPT and Bard using these models. But you need to fine-tune these language models when performing your deep learning projects. This is where AI platforms come in. Today, I\u2019ll show you how to build an [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-09-12T16:37:38+00:00","article_modified_time":"2025-04-24T17:14:11+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg","type":"","width":"","height":""}],"author":"Tirendaz AI","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Tirendaz AI","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/"},"author":{"name":"Tirendaz AI","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/1afb8219a115db20d32be46c5f0de930"},"headline":"Building a Text Classifier App with Hugging Face, BERT, and Comet","datePublished":"2023-09-12T16:37:38+00:00","dateModified":"2025-04-24T17:14:11+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/"},"wordCount":1501,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/","url":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/","name":"Building a Text Classifier App with Hugging Face, BERT, and Comet - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg","datePublished":"2023-09-12T16:37:38+00:00","dateModified":"2025-04-24T17:14:11+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*7BVBBKpZY3fmOnLgEfpoTQ.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/building-a-text-classifier-app-with-hugging-face-bert-and-comet\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Building a Text Classifier App with Hugging Face, BERT, and Comet"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/1afb8219a115db20d32be46c5f0de930","name":"Tirendaz AI","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/8186f103f79fbe423be05bc95bd135bc","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1669361802236-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1669361802236-96x96.jpg","caption":"Tirendaz AI"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/tirendazcontactgmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7439","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/70"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7439"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7439\/revisions"}],"predecessor-version":[{"id":15548,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7439\/revisions\/15548"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7439"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}