{"id":7337,"date":"2023-08-29T13:23:08","date_gmt":"2023-08-29T21:23:08","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7337"},"modified":"2025-04-24T17:14:31","modified_gmt":"2025-04-24T17:14:31","slug":"fine-tuning-bert-for-text-classification","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/","title":{"rendered":"Fine-tuning BERT for text classification"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf\" alt=\"\" width=\"899\" height=\"600\"><\/figure><div class=\"mf mg mh\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\">Photo by <a class=\"af mz\" href=\"https:\/\/unsplash.com\/@pawel_czerwinski?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Pawel Czerwinski<\/a> on <a class=\"af mz\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<p id=\"9fd2\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">BERT \u2014 Bidirectional Encoder Representations from Transformers \u2014 is a pre-trained language model for natural language processing tasks such as text classification and question and answering. This article will look at fine-tuning the BERT for text classification. In the end, the BERT model will learn to label if a review from the <code class=\"cw nv nw nx ny b\">imdb<\/code> dataset is positive or negative.<\/p>\n<p id=\"7c99\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">To understand how the model is learning, we need to visualize histograms of the weights and biases, the activations and gradients. To achieve that, we use Comet to track the project. <a class=\"af mz\" href=\"\/signup?utm_source=heartbeat&amp;utm_medium=referral&amp;utm_campaign=AMS_US_EN_SNUP_heartbeat_CTA\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a> automatically tracks these and other items such as:<\/p>\n<ul class=\"\">\n<li id=\"74d1\" class=\"na nb fo be b gm nc nd ne gp nf ng nh nz nj nk nl oa nn no np ob nr ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Optimizer Parameters<\/li>\n<li id=\"573e\" class=\"na nb fo be b gm of nd ne gp og ng nh nz oh nk nl oa oi no np ob oj ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Code<\/li>\n<li id=\"2b43\" class=\"na nb fo be b gm of nd ne gp og ng nh nz oh nk nl oa oi no np ob oj ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Optimizer Parameters<\/li>\n<li id=\"bfaf\" class=\"na nb fo be b gm of nd ne gp og ng nh nz oh nk nl oa oi no np ob oj ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Metrics<\/li>\n<li id=\"6130\" class=\"na nb fo be b gm of nd ne gp og ng nh nz oh nk nl oa oi no np ob oj ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Weight histograms<\/li>\n<\/ul>\n<h2 id=\"e669\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Getting started<\/h2>\n<p id=\"baf7\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">When using Comet, these items are logged by default, but you can manually configure what will be logged.<\/p>\n<pre>import comet_ml\n\nexperiment = comet_ml.Experiment(\n    api_key=\"YOUR_API_KEY\",\n     project_name=\"HF\", log_code=True,\n    auto_metric_logging=True,\n    auto_param_logging=True,\n    auto_histogram_weight_logging=True,\n    auto_histogram_gradient_logging=True,\n    auto_histogram_activation_logging=True,\n)<\/pre>\n<h2 id=\"f666\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Log parameters<\/h2>\n<p id=\"559f\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">Logging various parameters makes it easy to update them and compare how they affect the model&#8217;s performance. You can easily change a parameter when all parameters are saved in one dictionary. The <code class=\"cw nv nw nx ny b\">log_parameters<\/code>function is used for logging a dictionary of parameters in Comet.<\/p>\n<pre># these will all get logged\nparams = {\n    \"bert\": \"bert-base-uncased\",\n    \"num_labels\": 2,\n    \"return_tensors\": \"tf\",\n    \"batch_size\": 8,\n    \"epochs\": 3,\n    \"padding\":\"max_length\",\n    \"truncation\": True,\n    \"dataset\": \"imdb\",\n}\n\nexperiment.log_parameters(params)<\/pre>\n<h2 id=\"1224\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Tokenize text data<\/h2>\n<p id=\"9ebe\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">We\u2019ll use the <code class=\"cw nv nw nx ny b\"><a class=\"af mz\" href=\"https:\/\/huggingface.co\/datasets\/imdb\" target=\"_blank\" rel=\"noopener ugc nofollow\">imdb<\/a><\/code><a class=\"af mz\" href=\"https:\/\/huggingface.co\/datasets\/imdb\" target=\"_blank\" rel=\"noopener ugc nofollow\"> dataset<\/a> to fine-tune BERT. Create a numerical representation of the data because it\u2019s in text form. Use the <code class=\"cw nv nw nx ny b\">BertTokenizer<\/code>since you are fine-tuning a BERT model. This ensures that the data is in the form that the BERT requires. Next, we define a function that will tokenize the data and apply a maximum length and truncation to ensure that all sentences are the same length. Tokenizing the data converts it to a numerical representation that\u2019s acceptable by the machine learning model. You can\u2019t pass the raw sentences to the model.<\/p>\n<pre>def tokenize_function(examples):\n    from transformers import BertTokenizer\n    tokenizer = BertTokenizer.from_pretrained(params['bert'])\n    return tokenizer(examples[\"text\"], padding=params[\"padding\"], truncation=params[\"truncation\"])<\/pre>\n<p id=\"63cb\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Next, apply the function to the dataset. The map function applies the tokenization function to all the sentences. Next, shuffle the data and select the number of data points you would like to use.<\/p>\n<pre>from datasets import load_dataset\n\ndataset = load_dataset(params['dataset'])\n\ntokenizer = AutoTokenizer.from_pretrained(params['bert'])\ntokenized_datasets = dataset.map(tokenize_function, batched=True)\n\nsmall_train_dataset = tokenized_datasets[\"train\"].shuffle(seed=42).select(range(1000))\nsmall_eval_dataset = tokenized_datasets[\"test\"].shuffle(seed=42).select(range(1000))<\/pre>\n<h2 id=\"11c0\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Create TensorFlow dataset<\/h2>\n<p id=\"0dfc\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">We\u2019ll fine-tune the BERT model in <a class=\"af mz\" href=\"https:\/\/www.machinelearningnuggets.com\/tag\/tensorflow\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">TensorFlow<\/a>. Let\u2019s convert the dataset to a TensorFlow dataset format. Hugging Face provides the <code class=\"cw nv nw nx ny b\">DefaultDataCollator<\/code> function to batch the dataset and perform data augmentation. After that, use the <code class=\"cw nv nw nx ny b\">to_tf_dataset<\/code> function to convert the dataset to TensorFlow format.<\/p>\n<p id=\"6e96\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">The <code class=\"cw nv nw nx ny b\">to_tf_dataset<\/code> method allows you to define the columns and labels included in the dataset. Converting the data to TensorFlow makes it possible to train the model using the <code class=\"cw nv nw nx ny b\">fit<\/code> method and later evaluate it using the <code class=\"cw nv nw nx ny b\">evaluate<\/code> method.<\/p>\n<pre>from transformers import DefaultDataCollator\ndata_collator = DefaultDataCollator(return_tensors=params['return_tensors'])\n\ntf_train_dataset = small_train_dataset.to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\", \"token_type_ids\"],\n    label_cols=[\"labels\"],\n    shuffle=True,\n    collate_fn=data_collator,\n    batch_size=params['batch_size'],)\n\ntf_validation_dataset = small_eval_dataset.to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\", \"token_type_ids\"],\n    label_cols=[\"labels\"],\n    shuffle=False,\n    collate_fn=data_collator,\n    batch_size=params['batch_size'],)<\/pre>\n<h2 id=\"f451\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Train BERT model<\/h2>\n<p id=\"fe69\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">The <code class=\"cw nv nw nx ny b\">TFAutoModelForSequenceClassification<\/code> is a model class with a sequence classification head. We can use it to initialize a pre-trained BERT classification model. Next, compile the model under a low learning rate and fit it to the data. Using a low learning rate is important in <a class=\"af mz\" href=\"https:\/\/www.machinelearningnuggets.com\/transfer-learning-guide\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">transfer learning<\/a> to ensure that we don\u2019t overfit the model.<\/p>\n<pre>import tensorflow as tf\nfrom transformers import TFAutoModelForSequenceClassification\n\nbert = TFAutoModelForSequenceClassification.from_pretrained(params['bert'], num_labels=params['num_labels'])\nbert.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=tf.metrics.SparseCategoricalAccuracy(),)\nbert.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=params['epochs'])<\/pre>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<blockquote class=\"pv\"><p id=\"0621\" class=\"pw px fo be py pz qa qb qc qd qe nu dv\" data-selectable-paragraph=\"\">Innovation and academia go hand-in-hand.<a class=\"af mz\" href=\"https:\/\/www.youtube.com\/watch?v=7XCsi64HLQ8\" target=\"_blank\" rel=\"noopener ugc nofollow\"> Listen to our own CEO Gideon Mendels chat with the Stanford MLSys Seminar Series team<\/a> about the future of MLOps and give the <a class=\"af mz\" href=\"\/signup?utm_source=heartbeat&amp;utm_medium=referral&amp;utm_campaign=AMS_US_EN_SNUP_heartbeat_CTA\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet platform<\/a> a try for free!<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"bce3\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Evaluate model performance<\/h2>\n<p id=\"c234\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">Since auto-logging is active, you will see live results of the model training on Comet. On the charts panel, you will see graphs for the:<\/p>\n<ul class=\"\">\n<li id=\"803b\" class=\"na nb fo be b gm nc nd ne gp nf ng nh nz nj nk nl oa nn no np ob nr ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Loss<\/li>\n<li id=\"d4a0\" class=\"na nb fo be b gm of nd ne gp og ng nh nz oh nk nl oa oi no np ob oj ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Accuracy<\/li>\n<li id=\"e436\" class=\"na nb fo be b gm of nd ne gp og ng nh nz oh nk nl oa oi no np ob oj ns nt nu oc od oe bj\" data-selectable-paragraph=\"\">Epoch duration<\/li>\n<\/ul>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*DoEhNrX7caigNd7r8RCYEQ.gif\" alt=\"\" width=\"600\" height=\"346\"><\/figure><div class=\"mf mg qf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*DoEhNrX7caigNd7r8RCYEQ.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*DoEhNrX7caigNd7r8RCYEQ.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*DoEhNrX7caigNd7r8RCYEQ.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*DoEhNrX7caigNd7r8RCYEQ.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*DoEhNrX7caigNd7r8RCYEQ.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*DoEhNrX7caigNd7r8RCYEQ.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*DoEhNrX7caigNd7r8RCYEQ.gif 1200w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*DoEhNrX7caigNd7r8RCYEQ.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*DoEhNrX7caigNd7r8RCYEQ.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*DoEhNrX7caigNd7r8RCYEQ.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*DoEhNrX7caigNd7r8RCYEQ.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*DoEhNrX7caigNd7r8RCYEQ.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*DoEhNrX7caigNd7r8RCYEQ.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*DoEhNrX7caigNd7r8RCYEQ.gif 1200w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"2c12\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">The <strong class=\"be qg\">Code<\/strong> tab will show the code used in this experiment. On the hyperparameters tab, you will see all the logged parameters.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Cve29aIFsdchEuKxzHhZ2A.png\" alt=\"\" width=\"700\" height=\"377\"><\/figure><div class=\"mf mg qh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Cve29aIFsdchEuKxzHhZ2A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Cve29aIFsdchEuKxzHhZ2A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Cve29aIFsdchEuKxzHhZ2A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Cve29aIFsdchEuKxzHhZ2A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Cve29aIFsdchEuKxzHhZ2A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Cve29aIFsdchEuKxzHhZ2A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Cve29aIFsdchEuKxzHhZ2A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Cve29aIFsdchEuKxzHhZ2A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"775f\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">All model metrics can be viewed from the <strong class=\"be qg\">Metrics<\/strong> tab.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TGnD1Vdh4TxfLuQEbyI_ow.png\" alt=\"\" width=\"700\" height=\"377\"><\/figure><div class=\"mf mg qh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*TGnD1Vdh4TxfLuQEbyI_ow.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"31e3\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Click the <strong class=\"be qg\">System Metrics<\/strong> tab to see the Memory Usage and CPU Utilization for the model training process.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*ENqKVpYPnc8n-aQ4ibj0sA.png\" alt=\"\" width=\"700\" height=\"377\"><\/figure><div class=\"mf mg qh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*ENqKVpYPnc8n-aQ4ibj0sA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"48cb\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Click the <strong class=\"be qg\">Histograms<\/strong> tab to see histograms for the weights and biases, activations, and gradients.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*iliafQDV49Ct7OCU9ndLBQ.gif\" alt=\"\" width=\"600\" height=\"347\"><\/figure><div class=\"mf mg qf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*iliafQDV49Ct7OCU9ndLBQ.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*iliafQDV49Ct7OCU9ndLBQ.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*iliafQDV49Ct7OCU9ndLBQ.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*iliafQDV49Ct7OCU9ndLBQ.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*iliafQDV49Ct7OCU9ndLBQ.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*iliafQDV49Ct7OCU9ndLBQ.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*iliafQDV49Ct7OCU9ndLBQ.gif 1200w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*iliafQDV49Ct7OCU9ndLBQ.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*iliafQDV49Ct7OCU9ndLBQ.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*iliafQDV49Ct7OCU9ndLBQ.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*iliafQDV49Ct7OCU9ndLBQ.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*iliafQDV49Ct7OCU9ndLBQ.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*iliafQDV49Ct7OCU9ndLBQ.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*iliafQDV49Ct7OCU9ndLBQ.gif 1200w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<h2 id=\"aa1f\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Test model on new data<\/h2>\n<p id=\"ccf3\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">Check how the BERT model performs on new data. You can also log the test sentence to Comet. First, tokenize the input data, then pass it to the BERT model. It will output logits which you will need to decode.<\/p>\n<pre>input_sequence = \"I hated that movie, it was too slow\"\nexperiment.log_text(input_sequence)\n# encode context the generation is conditioned on\ninput_ids = tokenizer.encode(input_sequence, return_tensors='tf')\noutput = bert(input_ids)\nlogits = output.logits<\/pre>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*5zCSyOGTPmDDDM5PapJezg.png\" alt=\"\" width=\"700\" height=\"176\"><\/figure><div class=\"mf mg qi\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*5zCSyOGTPmDDDM5PapJezg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*5zCSyOGTPmDDDM5PapJezg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*5zCSyOGTPmDDDM5PapJezg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*5zCSyOGTPmDDDM5PapJezg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*5zCSyOGTPmDDDM5PapJezg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*5zCSyOGTPmDDDM5PapJezg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*5zCSyOGTPmDDDM5PapJezg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*5zCSyOGTPmDDDM5PapJezg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"2097\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Let\u2019s interpret the prediction and log it as well. You can get the predicted class by passing the logits to <code class=\"cw nv nw nx ny b\">tf.math.argmax<\/code>. Passing the predicted class to <code class=\"cw nv nw nx ny b\">bert.config.id2label<\/code> will give you the predicted label.<\/p>\n<pre>predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])\nprediction = bert.config.id2label[predicted_class_id]\nexperiment.log_text(prediction)\nprediction<\/pre>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png\" alt=\"\" width=\"700\" height=\"134\"><\/figure><div class=\"mf mg qj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*hzvk4p2Ul-6-WkPRuzLFRQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"9477\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">End the experiment to make sure all items are logged as expected.<\/p>\n<pre>experiment.end()<\/pre>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"af81\" class=\"qk ol fo be om ql qm go oq qn qo gr ou qp qq qr qs qt qu qv qw qx qy qz ra rb bj\" data-selectable-paragraph=\"\">Final thoughts<\/h1>\n<p id=\"588c\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\">This article has shown you how to fine-tune a BERT model for text classification while tracking the model using <a class=\"af mz\" href=\"http:\/\/comet.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a>. You can improve this model by increasing the amount of training data. You can also swap the BERT model with another <a class=\"af mz\" href=\"https:\/\/huggingface.co\/docs\/transformers\/index\" target=\"_blank\" rel=\"noopener ugc nofollow\">Hugging Face transformer<\/a> model and compare the performance.<\/p>\n<p id=\"e82a\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/www.linkedin.com\/in\/mwitiderrick\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Follow me on LinkedIn<\/a> for more technical resources.<\/p>\n<h2 id=\"e10b\" class=\"ok ol fo be om on oo op oq or os ot ou ni ov ow ox nm oy oz pa nq pb pc pd pe bj\" data-selectable-paragraph=\"\">Resources<\/h2>\n<p id=\"0864\" class=\"pw-post-body-paragraph na nb fo be b gm pf nd ne gp pg ng nh ni ph nk nl nm pi no np nq pj ns nt nu fh bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/www.comet.com\/mwitiderrick\/hf\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet experiment<\/a><\/p>\n<p id=\"7663\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/colab.research.google.com\/drive\/1vv6Jb2XxW9AJdKETkBpujCfiIzOytzUH?usp=sharing\" target=\"_blank\" rel=\"noopener ugc nofollow\">Notebook<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Pawel Czerwinski on Unsplash BERT \u2014 Bidirectional Encoder Representations from Transformers \u2014 is a pre-trained language model for natural language processing tasks such as text classification and question and answering. This article will look at fine-tuning the BERT for text classification. In the end, the BERT model will learn to label if a [&hellip;]<\/p>\n","protected":false},"author":63,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[163],"class_list":["post-7337","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Fine-tuning BERT for text classification - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Fine-tuning BERT for text classification\" \/>\n<meta property=\"og:description\" content=\"Photo by Pawel Czerwinski on Unsplash BERT \u2014 Bidirectional Encoder Representations from Transformers \u2014 is a pre-trained language model for natural language processing tasks such as text classification and question and answering. This article will look at fine-tuning the BERT for text classification. In the end, the BERT model will learn to label if a [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-29T21:23:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:14:31+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf\" \/>\n<meta name=\"author\" content=\"Derrick Mwiti\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Derrick Mwiti\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Fine-tuning BERT for text classification - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/","og_locale":"en_US","og_type":"article","og_title":"Fine-tuning BERT for text classification","og_description":"Photo by Pawel Czerwinski on Unsplash BERT \u2014 Bidirectional Encoder Representations from Transformers \u2014 is a pre-trained language model for natural language processing tasks such as text classification and question and answering. This article will look at fine-tuning the BERT for text classification. In the end, the BERT model will learn to label if a [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-29T21:23:08+00:00","article_modified_time":"2025-04-24T17:14:31+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf","type":"","width":"","height":""}],"author":"Derrick Mwiti","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Derrick Mwiti","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/"},"author":{"name":"Derrick Mwiti","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6"},"headline":"Fine-tuning BERT for text classification","datePublished":"2023-08-29T21:23:08+00:00","dateModified":"2025-04-24T17:14:31+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/"},"wordCount":736,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/","url":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/","name":"Fine-tuning BERT for text classification - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf","datePublished":"2023-08-29T21:23:08+00:00","dateModified":"2025-04-24T17:14:31+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*3nCTTnKHrnjTA-Gf"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/fine-tuning-bert-for-text-classification\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Fine-tuning BERT for text classification"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6","name":"Derrick Mwiti","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/b7db96aa11f77239bbde5eb79ede1493","url":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","caption":"Derrick Mwiti"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/mwitiderrickgmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7337","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7337"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7337\/revisions"}],"predecessor-version":[{"id":15567,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7337\/revisions\/15567"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7337"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7337"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7337"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7337"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}