{"id":7014,"date":"2023-08-01T06:10:36","date_gmt":"2023-08-01T14:10:36","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7014"},"modified":"2025-04-24T17:15:01","modified_gmt":"2025-04-24T17:15:01","slug":"causal-language-modeling-with-gpt","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/","title":{"rendered":"Causal Language Modeling with GPT"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW\" alt=\"\" width=\"700\" height=\"467\"><\/figure><div class=\"mg mh mi\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mv mw mx mg mh my mz be b bf z dv\" data-selectable-paragraph=\"\">Photo by <a class=\"af na\" href=\"https:\/\/unsplash.com\/@jayphoto?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Justin W<\/a> on <a class=\"af na\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<p id=\"ad69\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Text generation is the task of producing new text. An example of text generation using machine learning is <a class=\"af na\" href=\"https:\/\/github.com\/features\/copilot\" target=\"_blank\" rel=\"noopener ugc nofollow\">GitHub\u2019s Copilot<\/a>, which can generate code. Apart from code generation, text generation models can:<\/p>\n<ul class=\"\">\n<li id=\"48a0\" class=\"nb nc fo be b gm nd ne nf gp ng nh ni nw nk nl nm nx no np nq ny ns nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Generate stories, for example, by passing \u201cOnce upon a time &#8221; as input to a GPT-2 model.<\/li>\n<li id=\"b774\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Generate music lyrics.<\/li>\n<li id=\"a7ad\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Generate an entire article.<\/li>\n<li id=\"db26\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Completing incomplete sentences.<\/li>\n<li id=\"74cc\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Summarize long documents.<\/li>\n<li id=\"e29c\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Translate from one language to another.<\/li>\n<\/ul>\n<p id=\"e55a\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Text generation can be achieved using causal language models such as GPT-2. This article will look at how we can fine-tune a causal model with custom data to generate text given a few words.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"88af\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Causal Language Modeling vs. Masked Language Modeling<\/h2>\n<p id=\"da97\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">Given a sequence of tokens, <strong class=\"be pp\">Causal Language Modeling<\/strong> is the task of generating the next token. It differs from <strong class=\"be pp\">Masked Language Modeling,<\/strong> where certain words in a sentence are masked, and the model is trained to predict them.<\/p>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:383\/0*4qYkQYogkls9hbxY.png\" alt=\"\" width=\"383\" height=\"279\"><\/figure><div class=\"mg mh pq\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*4qYkQYogkls9hbxY.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*4qYkQYogkls9hbxY.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*4qYkQYogkls9hbxY.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*4qYkQYogkls9hbxY.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*4qYkQYogkls9hbxY.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*4qYkQYogkls9hbxY.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:766\/format:webp\/0*4qYkQYogkls9hbxY.png 766w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 383px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*4qYkQYogkls9hbxY.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*4qYkQYogkls9hbxY.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*4qYkQYogkls9hbxY.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*4qYkQYogkls9hbxY.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*4qYkQYogkls9hbxY.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*4qYkQYogkls9hbxY.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:766\/0*4qYkQYogkls9hbxY.png 766w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 383px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mv mw mx mg mh my mz be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af na\" href=\"https:\/\/towardsdatascience.com\/understanding-masked-language-models-mlm-and-causal-language-models-clm-in-nlp-194c15f56a5\" target=\"_blank\" rel=\"noopener\">source<\/a><\/figcaption>\n<\/figure>\n<p id=\"cac0\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">In <strong class=\"be pp\">Causal Language Modeling,<\/strong> the model only considers words to the left, while <strong class=\"be pp\">Masked Language Modeling<\/strong> considers words to the left and right. Therefore, Causal Language Modeling is <strong class=\"be pp\">unidirectional,<\/strong> while Masked Language Modeling is <strong class=\"be pp\">bidirectional<\/strong>.<\/p>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:382\/0*ZyMpLDw1P7ulg4i0.png\" alt=\"\" width=\"382\" height=\"285\"><\/figure><div class=\"mg mh pr\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:764\/format:webp\/0*ZyMpLDw1P7ulg4i0.png 764w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 382px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*ZyMpLDw1P7ulg4i0.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*ZyMpLDw1P7ulg4i0.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*ZyMpLDw1P7ulg4i0.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*ZyMpLDw1P7ulg4i0.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*ZyMpLDw1P7ulg4i0.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*ZyMpLDw1P7ulg4i0.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:764\/0*ZyMpLDw1P7ulg4i0.png 764w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 382px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mv mw mx mg mh my mz be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af na\" href=\"https:\/\/towardsdatascience.com\/understanding-masked-language-models-mlm-and-causal-language-models-clm-in-nlp-194c15f56a5\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/figcaption>\n<\/figure>\n<p id=\"c90b\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\"><strong class=\"be pp\">GPT<\/strong> is an example of a pre-trained Causal Language Model, while <strong class=\"be pp\">BERT<\/strong> is an example of a Masked Language Model.<\/p>\n<h2 id=\"cabc\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Getting started<\/h2>\n<p id=\"3e18\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">In this example, we\u2019ll train a Causal Language Model on the wikitext using GPT. Let\u2019s start by setting up an experiment to track the training process.<\/p>\n<pre>import comet_ml\n\nexperiment = comet_ml.Experiment(\n    api_key=\"YOUR_API_KEY\",\n     project_name=\"clm\", log_code=True,\n    auto_metric_logging=True,\n    auto_param_logging=True,\n    auto_histogram_weight_logging=True,\n    auto_histogram_gradient_logging=True,\n    auto_histogram_activation_logging=True,\n)<\/pre>\n<p id=\"1775\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Set up a <a class=\"af na\" href=\"\/signup?utm_source=heartbeat&amp;utm_medium=referral&amp;utm_campaign=AMS_US_EN_SNUP_heartbeat_CTA\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a> project to track our experiments.<\/p>\n<pre>import comet_ml\n\nexperiment = comet_ml.Experiment(\n    api_key=\"your_API_KEY\",\n     project_name=\"clm\", log_code=True,\n    auto_metric_logging=True,\n    auto_param_logging=True,\n    auto_histogram_weight_logging=True,\n    auto_histogram_gradient_logging=True,\n    auto_histogram_activation_logging=True,\n)<\/pre>\n<p id=\"d356\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Next, log model parameters to Comet.<\/p>\n<pre># these will all get logged\nparams = {\n    \"model\": \"gpt2\",\n    \"epochs\": 50,\n    \"batch_size\": 32,\n    \"learning_rate\": 2e-5,\n    \"weight_decay\": 0.01,\n\n}\n\nexperiment.log_parameters(params)<\/pre>\n<h2 id=\"6715\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Tokenize dataset<\/h2>\n<p id=\"d37b\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">The text dataset we will be using needs to be converted into a numerical representation. It needs to be converted to a representation that the GPT model expects. Let\u2019s define a function that will tokenize the data using the <code class=\"cw pv pw px py b\">GPT2Tokenizer<\/code>.<\/p>\n<pre>def tokenize_function(examples):\n    from transformers import GPT2Tokenizer\n    tokenizer = GPT2Tokenizer.from_pretrained(params['model'])\n    return tokenizer(examples[\"text\"])<\/pre>\n<p id=\"f13c\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Next, define a function that will group the text data and return it in small chunks that are easier to train on as they\u2019ll consume less memory.<\/p>\n<pre>def group_texts(examples):\n    # Concatenate all texts.\n    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}\n    total_length = len(concatenated_examples[list(examples.keys())[0]])\n    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can\n    # customize this part to your needs.\n    # block_size = tokenizer.model_max_length\n    block_size = 128\n    total_length = (total_length \/\/ block_size) * block_size\n    # Split by chunks of max_len.\n    result = {\n        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]\n        for k, t in concatenated_examples.items()\n    }\n    result[\"labels\"] = result[\"input_ids\"].copy()\n    return result<\/pre>\n<h2 id=\"a928\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Convert data to TensorFlow format<\/h2>\n<p id=\"a7fe\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">Training the model with <a class=\"af na\" href=\"https:\/\/www.machinelearningnuggets.com\/tag\/tensorflow\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">TensorFlow<\/a> requires that the dataset is converted to TensorFlow format. This is done in the following steps:<\/p>\n<ul class=\"\">\n<li id=\"2ad1\" class=\"nb nc fo be b gm nd ne nf gp ng nh ni nw nk nl nm nx no np nq ny ns nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Load the dataset using Hugging Face.<\/li>\n<li id=\"6f51\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Process the dataset using the tokenized function.<\/li>\n<li id=\"dfdd\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Apply the <code class=\"cw pv pw px py b\">group_texts<\/code> function.<\/li>\n<li id=\"5983\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Create a Hugging Face data collator to create batches from the dataset.<\/li>\n<li id=\"d873\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Create training and validation datasets using the <code class=\"cw pv pw px py b\">to_tf_dataset<\/code>function.<\/li>\n<\/ul>\n<pre>from datasets import load_dataset\nfrom transformers import DefaultDataCollator\n\ndatasets = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\")\ntokenized_datasets = datasets.map(tokenize_function, batched=True, num_proc=4, remove_columns=[\"text\"])\nlm_datasets = tokenized_datasets.map(\n    group_texts,\n    batched=True,\n    batch_size=1000,\n    num_proc=4,)\ndata_collator = DefaultDataCollator(return_tensors=\"tf\")\ntrain_set = lm_datasets[\"train\"].to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\", \"labels\"],\n    shuffle=True,\n    batch_size=params['batch_size'],\n    collate_fn=data_collator,)\n\nvalidation_set = lm_datasets[\"validation\"].to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\", \"labels\"],\n    shuffle=False,\n    batch_size=params['batch_size'],\n    collate_fn=data_collator,)<\/pre>\n<figure class=\"mj mk ml mm mn mo\"><\/figure>\n<p id=\"9e82\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">When you print the tokenized dataset you will see:<\/p>\n<ul class=\"\">\n<li id=\"54fb\" class=\"nb nc fo be b gm nd ne nf gp ng nh ni nw nk nl nm nx no np nq ny ns nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">The input ids<\/li>\n<li id=\"e193\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">The attention mask<\/li>\n<li id=\"86a1\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Labels<\/li>\n<li id=\"a0df\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Number of rows<\/li>\n<\/ul>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*dOi052ZH3Fzz7bBaarrWdQ.png\" alt=\"\" width=\"700\" height=\"128\"><\/figure><div class=\"mg mh pz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*dOi052ZH3Fzz7bBaarrWdQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*dOi052ZH3Fzz7bBaarrWdQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*dOi052ZH3Fzz7bBaarrWdQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*dOi052ZH3Fzz7bBaarrWdQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*dOi052ZH3Fzz7bBaarrWdQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*dOi052ZH3Fzz7bBaarrWdQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*dOi052ZH3Fzz7bBaarrWdQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*dOi052ZH3Fzz7bBaarrWdQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"7dbd\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Given some sentences are longer than others, we have to ensure that they are of the same length. In this case, the model&#8217;s maximum length is 1024.<\/p>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*WN-AdACRLc5qZaXWgHrVRA.png\" alt=\"\" width=\"700\" height=\"76\"><\/figure><div class=\"mg mh qa\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*WN-AdACRLc5qZaXWgHrVRA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*WN-AdACRLc5qZaXWgHrVRA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*WN-AdACRLc5qZaXWgHrVRA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*WN-AdACRLc5qZaXWgHrVRA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*WN-AdACRLc5qZaXWgHrVRA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*WN-AdACRLc5qZaXWgHrVRA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*WN-AdACRLc5qZaXWgHrVRA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*WN-AdACRLc5qZaXWgHrVRA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"63cb\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">To ensure that all sentences have the same length, we have to truncate longer sentences and pad shorter ones using a <strong class=\"be pp\">padding token<\/strong>. Therefore, we need a way to inform the model to ignore the padding tokens. This is done using an <strong class=\"be pp\">attention mask<\/strong>. The attention mask is a tensor with the same shape as the <code class=\"cw pv pw px py b\">input_ids<\/code>. The <code class=\"cw pv pw px py b\">input_ids<\/code> are the numerical representation of the sentences. The attention mask contains zeros and ones indicating whether the tokens should be attended to. As a result, the padding tokens will be ignored.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<blockquote class=\"qb\"><p id=\"8cec\" class=\"qc qd fo be qe qf qg qh qi qj qk nv dv\" data-selectable-paragraph=\"\">Comet is completely free for academics! <a class=\"af na\" href=\"https:\/\/go.comet.ml\/webinar-recommender-systems-for-business-impact.html?utm_source=discord&amp;utm_medium=social&amp;utm_campaign=Webinar_RecSys_2022&amp;utm_content=image-text\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sign up today<\/a> and get started with just two lines of code.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"5b7c\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Define causal language model<\/h2>\n<p id=\"9f78\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">With the dataset in the correct format, you can now define the causal language model in <a class=\"af na\" href=\"https:\/\/www.machinelearningnuggets.com\/tag\/tensorflow\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">TensorFlow<\/a>. The definition of the causal model is done by instantiating <code class=\"cw pv pw px py b\">TFAutoModelForCausalLM<\/code> and passing the desired config. In this case, it\u2019s a pre-trained GPT2 model.<\/p>\n<pre>from transformers import AutoTokenizer\nfrom transformers import AutoConfig, TFAutoModelForCausalLM\nfrom transformers import AdamWeightDecay\nmodel_checkpoint = params['model']\nconfig = AutoConfig.from_pretrained(model_checkpoint)\ngpt2 = TFAutoModelForCausalLM.from_config(config)\n<\/pre>\n<h2 id=\"b907\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Train Causal language model<\/h2>\n<p id=\"25c6\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">Since this is a pre-trained model, we train it at a low learning rate to avoid overfitting. Training is done by calling the <code class=\"cw pv pw px py b\">fit<\/code> method while passing the training and validation set.<\/p>\n<p id=\"3faa\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Once training is complete, we evaluate the model on the validation set and log the loss. Causal language models are evaluated using the cross-entropy loss and the perplexity. In this case, we use perplexity. The perplexity is the exponential of the cross-entropy.<\/p>\n<pre>import math\n\nlearning_rate = params['learning_rate']\nweight_decay = params['weight_decay']\noptimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)\ngpt2.compile(optimizer=optimizer)\ngpt2.fit(train_set, validation_data=validation_set, epochs=params['epochs'])\neval_loss = gpt2.evaluate(validation_set)\nexperiment.log_metrics({\"eval_loss\":eval_loss})\nprint(f\"Perplexity: {math.exp(eval_loss):.2f}\")<\/pre>\n<p id=\"014c\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Comet will log all TensorFlow metrics and hyperparameters automatically.<\/p>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*V6BYHoChHkWJxUSzkLWHZA.png\" alt=\"\" width=\"700\" height=\"787\"><\/figure><div class=\"mg mh ql\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*V6BYHoChHkWJxUSzkLWHZA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*V6BYHoChHkWJxUSzkLWHZA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*V6BYHoChHkWJxUSzkLWHZA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*V6BYHoChHkWJxUSzkLWHZA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*V6BYHoChHkWJxUSzkLWHZA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*V6BYHoChHkWJxUSzkLWHZA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*V6BYHoChHkWJxUSzkLWHZA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*V6BYHoChHkWJxUSzkLWHZA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*cGybQmCCFGfxeT4wCYfQ_A.png\" alt=\"\" width=\"700\" height=\"240\"><\/figure><div class=\"mg mh qm\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*cGybQmCCFGfxeT4wCYfQ_A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*cGybQmCCFGfxeT4wCYfQ_A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*cGybQmCCFGfxeT4wCYfQ_A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*cGybQmCCFGfxeT4wCYfQ_A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*cGybQmCCFGfxeT4wCYfQ_A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*cGybQmCCFGfxeT4wCYfQ_A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*cGybQmCCFGfxeT4wCYfQ_A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*cGybQmCCFGfxeT4wCYfQ_A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"d3b9\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">Comet also creates the histograms of the training process. From these visuals, you can see:<\/p>\n<ul class=\"\">\n<li id=\"59b9\" class=\"nb nc fo be b gm nd ne nf gp ng nh ni nw nk nl nm nx no np nq ny ns nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Histograms for weights and biases<\/li>\n<li id=\"cb03\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Histograms for activations<\/li>\n<li id=\"066a\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Histograms for gradients<\/li>\n<\/ul>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*aXOEkPUKUQRy63bobJ6RFQ.png\" alt=\"\" width=\"700\" height=\"244\"><\/figure><div class=\"mg mh qn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*aXOEkPUKUQRy63bobJ6RFQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*aXOEkPUKUQRy63bobJ6RFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*aXOEkPUKUQRy63bobJ6RFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*aXOEkPUKUQRy63bobJ6RFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*aXOEkPUKUQRy63bobJ6RFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*aXOEkPUKUQRy63bobJ6RFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*aXOEkPUKUQRy63bobJ6RFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*aXOEkPUKUQRy63bobJ6RFQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*A0qvhL2JYobmuGq35aCIsg.png\" alt=\"\" width=\"700\" height=\"247\"><\/figure><div class=\"mg mh qn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*A0qvhL2JYobmuGq35aCIsg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*A0qvhL2JYobmuGq35aCIsg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*A0qvhL2JYobmuGq35aCIsg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*A0qvhL2JYobmuGq35aCIsg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*A0qvhL2JYobmuGq35aCIsg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*A0qvhL2JYobmuGq35aCIsg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*A0qvhL2JYobmuGq35aCIsg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*A0qvhL2JYobmuGq35aCIsg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*l0QUSfwYcFvZjtT2akM2EQ.png\" alt=\"\" width=\"700\" height=\"247\"><\/figure><div class=\"mg mh qn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*l0QUSfwYcFvZjtT2akM2EQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*l0QUSfwYcFvZjtT2akM2EQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*l0QUSfwYcFvZjtT2akM2EQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*l0QUSfwYcFvZjtT2akM2EQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*l0QUSfwYcFvZjtT2akM2EQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*l0QUSfwYcFvZjtT2akM2EQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*l0QUSfwYcFvZjtT2akM2EQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*l0QUSfwYcFvZjtT2akM2EQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h2 id=\"992f\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Test causal language model<\/h2>\n<p id=\"ddcd\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">Let\u2019s now try and generate some tokens from the trained causal model. This is done by:<\/p>\n<ul class=\"\">\n<li id=\"8259\" class=\"nb nc fo be b gm nd ne nf gp ng nh ni nw nk nl nm nx no np nq ny ns nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Creating an input sequence.<\/li>\n<li id=\"7294\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Tokenizing the sequence using the same tokenizer that was used for training.<\/li>\n<li id=\"55d7\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Call the <code class=\"cw pv pw px py b\">generate<\/code> method while passing the tokenized sentence.<\/li>\n<li id=\"3ead\" class=\"nb nc fo be b gm oc ne nf gp od nh ni nw oe nl nm nx of np nq ny og nt nu nv nz oa ob bj\" data-selectable-paragraph=\"\">Decode the generated sequence to see the sentence.<\/li>\n<\/ul>\n<p id=\"29d6\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\">We also log the input and generated sentences to Comet and eventually end the experiment.<\/p>\n<pre>input_sequence = \"The ship\"\nexperiment.log_text(input_sequence)\ntokenizer_checkpoint = params['model']\ntokenizer = AutoTokenizer.from_pretrained(tokenizer_checkpoint)\n# encode context the generation is conditioned on\ninput_ids = tokenizer.encode(input_sequence, return_tensors='tf')\noutput = gpt2.generate(input_ids,min_length=20)\nprint(\"Output:\\n\" + 100 * '-')\nresult = tokenizer.decode(output[0], skip_special_tokens = True)\nexperiment.log_text(result)\nexperiment.end()\nresult<\/pre>\n<figure class=\"mj mk ml mm mn mo mg mh paragraph-image\">\n<div class=\"mp mq eb mr bg ms\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mt mu c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*dTZjKXQmOC7VapM3rruWsA.png\" alt=\"\" width=\"700\" height=\"205\"><\/figure><div class=\"mg mh qo\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*dTZjKXQmOC7VapM3rruWsA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*dTZjKXQmOC7VapM3rruWsA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*dTZjKXQmOC7VapM3rruWsA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*dTZjKXQmOC7VapM3rruWsA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*dTZjKXQmOC7VapM3rruWsA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*dTZjKXQmOC7VapM3rruWsA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*dTZjKXQmOC7VapM3rruWsA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*dTZjKXQmOC7VapM3rruWsA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"6e61\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Final thoughts<\/h2>\n<p id=\"e4ce\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\">In this article, you have seen how to create a text generation model using Hugging Face and Comet. You can tweak the project by trying other types of language models.<\/p>\n<p id=\"ffa4\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\"><a class=\"af na\" href=\"https:\/\/www.linkedin.com\/in\/mwitiderrick\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Follow me on LinkedIn<\/a> for more technical resources.<\/p>\n<h2 id=\"7669\" class=\"op oq fo be or os ot ou ov ow ox oy oz nj pa pb pc nn pd pe pf nr pg ph pi pj bj\" data-selectable-paragraph=\"\">Resources<\/h2>\n<p id=\"929b\" class=\"pw-post-body-paragraph nb nc fo be b gm pk ne nf gp pl nh ni nj pm nl nm nn pn np nq nr po nt nu nv fh bj\" data-selectable-paragraph=\"\"><a class=\"af na\" href=\"https:\/\/www.comet.com\/mwitiderrick\/clm\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet experiment<\/a><\/p>\n<p id=\"31d2\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\"><a class=\"af na\" href=\"https:\/\/www.kaggle.com\/derrickmwiti\/causal-language-modelling\" target=\"_blank\" rel=\"noopener ugc nofollow\">Notebook<\/a><\/p>\n<p id=\"43f8\" class=\"pw-post-body-paragraph nb nc fo be b gm nd ne nf gp ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv fh bj\" data-selectable-paragraph=\"\"><a class=\"af na\" href=\"https:\/\/huggingface.co\/course\/chapter7\/6\" target=\"_blank\" rel=\"noopener ugc nofollow\">Training a causal language model from scratch<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Justin W on Unsplash Text generation is the task of producing new text. An example of text generation using machine learning is GitHub\u2019s Copilot, which can generate code. Apart from code generation, text generation models can: Generate stories, for example, by passing \u201cOnce upon a time &#8221; as input to a GPT-2 model. [&hellip;]<\/p>\n","protected":false},"author":63,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[163],"class_list":["post-7014","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Causal Language Modeling with GPT - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Causal Language Modeling with GPT\" \/>\n<meta property=\"og:description\" content=\"Photo by Justin W on Unsplash Text generation is the task of producing new text. An example of text generation using machine learning is GitHub\u2019s Copilot, which can generate code. Apart from code generation, text generation models can: Generate stories, for example, by passing \u201cOnce upon a time &#8221; as input to a GPT-2 model. [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-01T14:10:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:15:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW\" \/>\n<meta name=\"author\" content=\"Derrick Mwiti\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Derrick Mwiti\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Causal Language Modeling with GPT - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/","og_locale":"en_US","og_type":"article","og_title":"Causal Language Modeling with GPT","og_description":"Photo by Justin W on Unsplash Text generation is the task of producing new text. An example of text generation using machine learning is GitHub\u2019s Copilot, which can generate code. Apart from code generation, text generation models can: Generate stories, for example, by passing \u201cOnce upon a time &#8221; as input to a GPT-2 model. [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-01T14:10:36+00:00","article_modified_time":"2025-04-24T17:15:01+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW","type":"","width":"","height":""}],"author":"Derrick Mwiti","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Derrick Mwiti","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/"},"author":{"name":"Derrick Mwiti","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6"},"headline":"Causal Language Modeling with GPT","datePublished":"2023-08-01T14:10:36+00:00","dateModified":"2025-04-24T17:15:01+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/"},"wordCount":820,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/","url":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/","name":"Causal Language Modeling with GPT - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW","datePublished":"2023-08-01T14:10:36+00:00","dateModified":"2025-04-24T17:15:01+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*RSDWtTJcbLkD-NOW"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/causal-language-modeling-with-gpt\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Causal Language Modeling with GPT"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6","name":"Derrick Mwiti","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/b7db96aa11f77239bbde5eb79ede1493","url":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","caption":"Derrick Mwiti"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/mwitiderrickgmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7014","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7014"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7014\/revisions"}],"predecessor-version":[{"id":15593,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7014\/revisions\/15593"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7014"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7014"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7014"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7014"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}