{"id":7258,"date":"2023-08-21T09:01:47","date_gmt":"2023-08-21T17:01:47","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7258"},"modified":"2025-04-24T17:14:41","modified_gmt":"2025-04-24T17:14:41","slug":"research-guide-model-distillation-techniques-for-deep-learning","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/","title":{"rendered":"Research Guide: Model Distillation Techniques for Deep Learning"},"content":{"rendered":"\n<div class=\"fh fi fj fk fl\">\n<div class=\"mf bg\">\n<figure class=\"mg mh mi mj mk mf bg paragraph-image\"><picture><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg\" alt=\"\" width=\"2400\" height=\"1667\"><\/picture><figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/unsplash.com\/photos\/_FilM6g6DEQ\" target=\"_blank\" rel=\"noopener ugc nofollow\">Image Source<\/a><\/figcaption><\/figure>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"73cf\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Knowledge distillation is a model compression technique whereby a small network (student) is taught by a larger trained neural network (teacher). The smaller network is trained to behave like the large neural network. This enables the deployment of such models on small devices such as mobile phones or other edge devices. In this guide, we\u2019ll look at a couple of papers that attempt to tackle this challenge.<\/p>\n<h2 id=\"a447\" class=\"nq nr fo be ns nt nu go nv nw nx gr ny nz oa ob oc od oe of og oh oi oj ok ol bj\">Distilling the Knowledge in a Neural Network (NIPS, 2014)<\/h2>\n<p id=\"25e0\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">In this paper, a small model is trained to generalize in the same way as the larger teacher model. Transferring the generalization is done by using the class probabilities of the large model as targets while training the smaller model. If the large model is an ensemble of simpler models, the geometric or arithmetic mean of their predictive distributions is used as the target.<\/p>\n<div class=\"or os ot ou ov ow\">\n<div class=\"ox ab ik\">\n<div class=\"oy ab cn ca oz pa\">\n<h6 class=\"be fp ia z is pb iu iv pc ix iz fn bj\"><a href=\"https:\/\/arxiv.org\/abs\/1503.02531\">Distilling the Knowledge in a Neural Network<\/a><\/h6>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"b5c4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">In testing the distillation, the authors trained a single large neural net with two hidden layers of 1200 linear hidden units on 60,000 training cases. The network was regularized using dropout and weight-constraints. The input images were jittered by two pixels in any direction. This network had 67 test errors. A smaller network with two hidden layers of 800 rectified linear units and no regularization had 146 errors. When the smaller network was regularized by matching the soft targets with the large net, it obtained 74 test errors.<\/p>\n<p id=\"8014\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The following results were obtained when the technique was used on speech recognition.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*V8HART7mTUfVm95QCVnL2A.png\" alt=\"\" width=\"700\" height=\"179\"><\/figure><div class=\"mq mr pf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*V8HART7mTUfVm95QCVnL2A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*V8HART7mTUfVm95QCVnL2A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*V8HART7mTUfVm95QCVnL2A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*V8HART7mTUfVm95QCVnL2A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*V8HART7mTUfVm95QCVnL2A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*V8HART7mTUfVm95QCVnL2A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*V8HART7mTUfVm95QCVnL2A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*V8HART7mTUfVm95QCVnL2A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1503.02531.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"7c25\" class=\"nq nr fo be ns nt qc go nv nw qd gr ny nz qe ob oc od qf of og oh qg oj ok ol bj\">Contrastive Representation Distillation (2019)<\/h2>\n<p id=\"7dc8\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">This paper leverages the family of contrastive objectives to capture correlations and higher-order output dependencies. They are adapted in this paper for purposes of knowledge distillation from one neural network to another.<\/p>\n<div class=\"or os ot ou ov ow\">\n<div class=\"ox ab ik\">\n<div class=\"oy ab cn ca oz pa\">\n<h6 class=\"be fp ia z is pb iu iv pc ix iz fn bj\"><a href=\"https:\/\/arxiv.org\/abs\/1910.10699\">Contrastive Representation Distillation<\/a><\/h6>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"51ca\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">As shown below, the paper considers three distillation stages:<\/p>\n<ul class=\"\">\n<li id=\"7bd4\" class=\"mv mw fo be b gm mx my mz gp na nb nc qh ne nf ng qi ni nj nk qj nm nn no np qk ql qm bj\" data-selectable-paragraph=\"\">model compression<\/li>\n<li id=\"2599\" class=\"mv mw fo be b gm qn my mz gp qo nb nc qh qp nf ng qi qq nj nk qj qr nn no np qk ql qm bj\" data-selectable-paragraph=\"\">transferring knowledge from one modality (e.g RGB) to another (e.g., depth)<\/li>\n<li id=\"c3f9\" class=\"mv mw fo be b gm qn my mz gp qo nb nc qh qp nf ng qi qq nj nk qj qr nn no np qk ql qm bj\" data-selectable-paragraph=\"\">distilling an ensemble of networks into a single network<\/li>\n<\/ul>\n<p id=\"fda0\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The main idea in contrastive learning is learning a representation that\u2019s close in some metric space for positive pairs while pushing away the representations between negative pairs.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*K3smDf1qq9MxHVDScxCRyg.png\" alt=\"\" width=\"700\" height=\"331\"><\/figure><div class=\"mq mr qs\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*K3smDf1qq9MxHVDScxCRyg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*K3smDf1qq9MxHVDScxCRyg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*K3smDf1qq9MxHVDScxCRyg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*K3smDf1qq9MxHVDScxCRyg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*K3smDf1qq9MxHVDScxCRyg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*K3smDf1qq9MxHVDScxCRyg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*K3smDf1qq9MxHVDScxCRyg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*K3smDf1qq9MxHVDScxCRyg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1910.10699.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"57b5\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The contrastive representation distillation (CRD) framework is tested on:<\/p>\n<ul class=\"\">\n<li id=\"6a6a\" class=\"mv mw fo be b gm mx my mz gp na nb nc qh ne nf ng qi ni nj nk qj nm nn no np qk ql qm bj\" data-selectable-paragraph=\"\">model compression of a large network to a smaller one<\/li>\n<li id=\"4848\" class=\"mv mw fo be b gm qn my mz gp qo nb nc qh qp nf ng qi qq nj nk qj qr nn no np qk ql qm bj\" data-selectable-paragraph=\"\">cross-modal knowledge transfer<\/li>\n<li id=\"e161\" class=\"mv mw fo be b gm qn my mz gp qo nb nc qh qp nf ng qi qq nj nk qj qr nn no np qk ql qm bj\" data-selectable-paragraph=\"\">ensemble distillation from a group of teachers to a single student network<\/li>\n<\/ul>\n<p id=\"77e9\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The technique was tested on CIFAR-100, ImageNet, STL-10, TinyImageNet, and NYU-Depth V2. Some of the results obtained are shown below.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:496\/1*2MF0F1EynML3tPCi022IeQ.png\" alt=\"\" width=\"496\" height=\"622\"><\/figure><div class=\"mq mr qt\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:992\/format:webp\/1*2MF0F1EynML3tPCi022IeQ.png 992w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 496px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*2MF0F1EynML3tPCi022IeQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*2MF0F1EynML3tPCi022IeQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*2MF0F1EynML3tPCi022IeQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*2MF0F1EynML3tPCi022IeQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*2MF0F1EynML3tPCi022IeQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*2MF0F1EynML3tPCi022IeQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:992\/1*2MF0F1EynML3tPCi022IeQ.png 992w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 496px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/abs\/1910.10699\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"6021\" class=\"nq nr fo be ns nt qc go nv nw qd gr ny nz qe ob oc od qf of og oh qg oj ok ol bj\">Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework (2019)<\/h2>\n<p id=\"f872\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">The approach proposed in this paper is known as Variational Student. It incorporates the compressibility of the knowledge distillation framework and the sparsity inducing abilities of variational inference (VI) techniques. The authors build a sparse student network. The sparsity of this network is induced by the variational parameters found via optimizing a loss function based on VI. This is done by taking advantage of the knowledge learned from the teacher network.<\/p>\n<div class=\"or os ot ou ov ow\">\n<div class=\"ox ab ik\">\n<div class=\"oy ab cn ca oz pa\">\n<h6 class=\"be fp ia z is pb iu iv pc ix iz fn bj\"><a href=\"https:\/\/arxiv.org\/abs\/1910.12061\">Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation FrameworkThe holy grail in deep neural network research is porting the memory- and computation-intensive network models on\u2026arxiv.org<\/a><\/h6>\n<div class=\"pd l\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"4ef2\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">This paper considers a Bayesian neural network (BNN) in a vanilla KD framework, whereby the student employs a variational penalized least-squares objective function. This ensures that the student network is compact as compared to the teacher network by the virtue of KD. It enables the integration of sparsity techniques, such as sparse variational dropout (SVD) and variational Bayesian dropout (VBD). This leads to the achievement of a sparse student.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*LOyJ7xI2ubEpqAlnBYXvsA.png\" alt=\"\" width=\"700\" height=\"418\"><\/figure><div class=\"mq mr qu\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*LOyJ7xI2ubEpqAlnBYXvsA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/abs\/1910.12061\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"5949\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Some of the results obtained with this method are shown below.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*bCfTnjcbO-YUxq_CQyOh6g.png\" alt=\"\" width=\"700\" height=\"212\"><\/figure><div class=\"mq mr qv\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*bCfTnjcbO-YUxq_CQyOh6g.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*bCfTnjcbO-YUxq_CQyOh6g.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*bCfTnjcbO-YUxq_CQyOh6g.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*bCfTnjcbO-YUxq_CQyOh6g.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*bCfTnjcbO-YUxq_CQyOh6g.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*bCfTnjcbO-YUxq_CQyOh6g.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*bCfTnjcbO-YUxq_CQyOh6g.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*bCfTnjcbO-YUxq_CQyOh6g.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/abs\/1910.12061\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"8d3b\" class=\"nq nr fo be ns nt qc go nv nw qd gr ny nz qe ob oc od qf of og oh qg oj ok ol bj\">Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher (2019)<\/h2>\n<p id=\"6afb\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">This paper shows that the performance of the student network degrades when the gap between the teacher and the student is large. The paper introduces a teacher assistant \u2014 a multi-step knowledge distillation \u2014 that bridges the gap between the student and the teacher. The approach is tested on CIFAR-10 and CIFAR-100 datasets.<\/p>\n<div class=\"or os ot ou ov ow\">\n<div class=\"ox ab ik\">\n<div class=\"oy ab cn ca oz pa\">\n<h6 class=\"be fp ia z is pb iu iv pc ix iz fn bj\"><a href=\"https:\/\/arxiv.org\/abs\/1902.03393\">Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher<\/a><\/h6>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"8199\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The paper introduces Teacher Assistant Knowledge Distillation (TAKD), along with intermediate models known as teacher assistants (TAs). The TA models are distilled from the teacher, and the student is only distilled from the TAs.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:621\/1*LAGuIW5T-5BgG2mRkk9sYg.png\" alt=\"\" width=\"621\" height=\"301\"><\/figure><div class=\"mq mr qw\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1242\/format:webp\/1*LAGuIW5T-5BgG2mRkk9sYg.png 1242w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 621px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*LAGuIW5T-5BgG2mRkk9sYg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*LAGuIW5T-5BgG2mRkk9sYg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*LAGuIW5T-5BgG2mRkk9sYg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*LAGuIW5T-5BgG2mRkk9sYg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*LAGuIW5T-5BgG2mRkk9sYg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*LAGuIW5T-5BgG2mRkk9sYg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1242\/1*LAGuIW5T-5BgG2mRkk9sYg.png 1242w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 621px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1902.03393.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"1796\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Figure 2 below shows the distillation performance as the teacher size increases. Figure 3 shows that decreasing the student size increases the student\u2019s performance.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*6KgACRX6W76WMGs0AmGX2A.png\" alt=\"\" width=\"700\" height=\"176\"><\/figure><div class=\"mq mr qx\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*6KgACRX6W76WMGs0AmGX2A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*6KgACRX6W76WMGs0AmGX2A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*6KgACRX6W76WMGs0AmGX2A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*6KgACRX6W76WMGs0AmGX2A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*6KgACRX6W76WMGs0AmGX2A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*6KgACRX6W76WMGs0AmGX2A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*6KgACRX6W76WMGs0AmGX2A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*6KgACRX6W76WMGs0AmGX2A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1902.03393.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"bbc5\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The approach is evaluated using plain CNN and ResNet architectures. Here are some of the accuracies obtained with different TA sizes:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:454\/1*DvStcoanxP8S8Gj8HNvOfQ.png\" alt=\"\" width=\"454\" height=\"517\"><\/figure><div class=\"mq mr qy\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:908\/format:webp\/1*DvStcoanxP8S8Gj8HNvOfQ.png 908w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 454px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*DvStcoanxP8S8Gj8HNvOfQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*DvStcoanxP8S8Gj8HNvOfQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*DvStcoanxP8S8Gj8HNvOfQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*DvStcoanxP8S8Gj8HNvOfQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*DvStcoanxP8S8Gj8HNvOfQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*DvStcoanxP8S8Gj8HNvOfQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:908\/1*DvStcoanxP8S8Gj8HNvOfQ.png 908w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 454px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1902.03393.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"f552\" class=\"nq nr fo be ns nt qc go nv nw qd gr ny nz qe ob oc od qf of og oh qg oj ok ol bj\">On the Efficacy of Knowledge Distillation (ICCV 2019)<\/h2>\n<p id=\"5329\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">This paper is majorly concerned with the ability of knowledge distillation techniques to effectively generalize in the training of the student network. According to the authors&#8217; findings, a higher accuracy on the teacher network doesn\u2019t necessarily mean a high accuracy for the student network. The network architectures used in this paper are ResNet, WideResNet, and DenseNet.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:559\/1*II6njeoWehDHo1vafLIf9Q.png\" alt=\"\" width=\"559\" height=\"387\"><\/figure><div class=\"mq mr qz\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1118\/format:webp\/1*II6njeoWehDHo1vafLIf9Q.png 1118w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 559px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*II6njeoWehDHo1vafLIf9Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*II6njeoWehDHo1vafLIf9Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*II6njeoWehDHo1vafLIf9Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*II6njeoWehDHo1vafLIf9Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*II6njeoWehDHo1vafLIf9Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*II6njeoWehDHo1vafLIf9Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1118\/1*II6njeoWehDHo1vafLIf9Q.png 1118w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 559px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1910.01348.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<div class=\"or os ot ou ov ow\">\n<div class=\"ox ab ik\">\n<div class=\"oy ab cn ca oz pa\">\n<h6 class=\"be fp ia z is pb iu iv pc ix iz fn bj\"><a href=\"https:\/\/arxiv.org\/abs\/1910.01348\">On the Efficacy of Knowledge Distillation<\/a><\/h6>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"fccb\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The figure below shows the error plot of student networks distilled from different teachers on CIFAR10.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*crh8XurNHrYwmOXiMNhoFQ.png\" alt=\"\" width=\"700\" height=\"287\"><\/figure><div class=\"mq mr ra\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*crh8XurNHrYwmOXiMNhoFQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*crh8XurNHrYwmOXiMNhoFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*crh8XurNHrYwmOXiMNhoFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*crh8XurNHrYwmOXiMNhoFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*crh8XurNHrYwmOXiMNhoFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*crh8XurNHrYwmOXiMNhoFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*crh8XurNHrYwmOXiMNhoFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*crh8XurNHrYwmOXiMNhoFQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1910.01348.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"c249\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The experiment was also conducted on ImageNet, with ResNet18 as the student and ResNet18, ResNet34, ResNet50, and ResNet152 as teachers. The experiments prove that bigger models aren\u2019t better teachers.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:571\/1*W1Vet3fBdi0NDNSHkqbaaw.png\" alt=\"\" width=\"571\" height=\"640\"><\/figure><div class=\"mq mr rb\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1142\/format:webp\/1*W1Vet3fBdi0NDNSHkqbaaw.png 1142w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 571px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*W1Vet3fBdi0NDNSHkqbaaw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*W1Vet3fBdi0NDNSHkqbaaw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*W1Vet3fBdi0NDNSHkqbaaw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*W1Vet3fBdi0NDNSHkqbaaw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*W1Vet3fBdi0NDNSHkqbaaw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*W1Vet3fBdi0NDNSHkqbaaw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1142\/1*W1Vet3fBdi0NDNSHkqbaaw.png 1142w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 571px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1910.01348.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"e198\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The figure below shows that the reason bigger models are not better teachers is that the student network is unable to mimic the large teachers.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:553\/1*xHd4cXTJZIh5_jxOiVUGSA.png\" alt=\"\" width=\"553\" height=\"351\"><\/figure><div class=\"mq mr rc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1106\/format:webp\/1*xHd4cXTJZIh5_jxOiVUGSA.png 1106w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 553px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*xHd4cXTJZIh5_jxOiVUGSA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*xHd4cXTJZIh5_jxOiVUGSA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*xHd4cXTJZIh5_jxOiVUGSA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*xHd4cXTJZIh5_jxOiVUGSA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*xHd4cXTJZIh5_jxOiVUGSA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*xHd4cXTJZIh5_jxOiVUGSA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1106\/1*xHd4cXTJZIh5_jxOiVUGSA.png 1106w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 553px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1910.01348.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"83d8\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">A solution proposed in this paper is to stop the teacher training early in order to obtain a solution that\u2019s more amenable to the student.<\/p>\n<h2 id=\"d871\" class=\"nq nr fo be ns nt nu go nv nw nx gr ny nz oa ob oc od oe of og oh oi oj ok ol bj\">Dynamic Kernel Distillation for Efficient Pose Estimation in Videos (ICCV 2019)<\/h2>\n<p id=\"8e2e\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">Localization of body joints in <a class=\"af mu\" href=\"https:\/\/heartbeat.comet.ml\/a-2019-guide-to-human-pose-estimation-c10b79b64b73\" target=\"_blank\" rel=\"noopener ugc nofollow\">human pose estimation <\/a>applies large networks on every frame in a video. This process usually incurs high computational costs. The authors of this paper propose Dynamic Kernel Distillation (DKD) to address this challenge.<\/p>\n<p id=\"fd90\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">DKD introduces a lightweight distillator to online distill pose kernels through enlarging temporal cues from the previous frame in a one-shot feed-forward manner. DKD simplifies body joint localization into a matching procedure between the pose kernels and the current frame. DKD transfers pose knowledge from one frame to provide guidance for body joint localization in the following frame. This enables the use of small networks in video-based pose estimation.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:596\/1*c8RAMjjxFLgZFPkqdMbsRg.png\" alt=\"\" width=\"596\" height=\"488\"><\/figure><div class=\"mq mr rd\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1192\/format:webp\/1*c8RAMjjxFLgZFPkqdMbsRg.png 1192w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 596px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*c8RAMjjxFLgZFPkqdMbsRg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*c8RAMjjxFLgZFPkqdMbsRg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*c8RAMjjxFLgZFPkqdMbsRg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*c8RAMjjxFLgZFPkqdMbsRg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*c8RAMjjxFLgZFPkqdMbsRg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*c8RAMjjxFLgZFPkqdMbsRg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1192\/1*c8RAMjjxFLgZFPkqdMbsRg.png 1192w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 596px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/abs\/1908.09216\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<h6><a href=\"https:\/\/arxiv.org\/abs\/1908.09216\">Dynamic Kernel Distillation for Efficient Pose Estimation in Videos<\/a><\/h6>\n<p id=\"dc36\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The training process is performed by exploiting a temporal adversarial training strategy. This strategy introduces a temporal discriminator to generate temporally coherent pose kernels and pose estimation results within a long range. This approach is tested on the Penn Action and Sub-JHMDB benchmarks.<\/p>\n<p id=\"c5c4\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The architecture of this approach is shown below. It\u2019s made up of a pose initializer, a frame encoder, a pose kernel distillator, and a temporally adversarial discriminator. DKD uses the pose initializer to estimate its confidence maps. The frame encoder is responsible for extracting high-level features to match the pose kernel from the pose kernel distillator. The pose kernel distillator takes the temporal information as input and distills the pose kernels in a one-shot feed-forward manner. And the temporally adversarial discriminator is used to enhance the learning process of the pose kernel distillator, with confidence map variations as auxiliary temporal supervision.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*zQ9kUQJW2OM3Qi8KevIlog.png\" alt=\"\" width=\"700\" height=\"258\"><\/figure><div class=\"mq mr re\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*zQ9kUQJW2OM3Qi8KevIlog.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*zQ9kUQJW2OM3Qi8KevIlog.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*zQ9kUQJW2OM3Qi8KevIlog.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*zQ9kUQJW2OM3Qi8KevIlog.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*zQ9kUQJW2OM3Qi8KevIlog.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*zQ9kUQJW2OM3Qi8KevIlog.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*zQ9kUQJW2OM3Qi8KevIlog.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*zQ9kUQJW2OM3Qi8KevIlog.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"3afb\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Some of the results obtained with the Penn Action dataset are shown below:<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:392\/1*qDoXqbfyJDW9XbRlcF_8bA.png\" alt=\"\" width=\"555\" height=\"842\"><\/figure><div class=\"mq mr rf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:784\/format:webp\/1*qDoXqbfyJDW9XbRlcF_8bA.png 784w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 392px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*qDoXqbfyJDW9XbRlcF_8bA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*qDoXqbfyJDW9XbRlcF_8bA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*qDoXqbfyJDW9XbRlcF_8bA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*qDoXqbfyJDW9XbRlcF_8bA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*qDoXqbfyJDW9XbRlcF_8bA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*qDoXqbfyJDW9XbRlcF_8bA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:784\/1*qDoXqbfyJDW9XbRlcF_8bA.png 784w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 392px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/abs\/1908.09216\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"58df\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">Here\u2019s a comparison of the results obtained on the Penn Action and Sub-JHMDB datasets.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*kJCi_RVoNn7UNaLS55b4Bw.png\" alt=\"\" width=\"700\" height=\"283\"><\/figure><div class=\"mq mr rg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*kJCi_RVoNn7UNaLS55b4Bw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*kJCi_RVoNn7UNaLS55b4Bw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*kJCi_RVoNn7UNaLS55b4Bw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*kJCi_RVoNn7UNaLS55b4Bw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*kJCi_RVoNn7UNaLS55b4Bw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*kJCi_RVoNn7UNaLS55b4Bw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*kJCi_RVoNn7UNaLS55b4Bw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*kJCi_RVoNn7UNaLS55b4Bw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/abs\/1908.09216\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"9679\" class=\"nq nr fo be ns nt qc go nv nw qd gr ny nz qe ob oc od qf of og oh qg oj ok ol bj\">DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS 2019)<\/h2>\n<p id=\"a147\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\"><a href=\"https:\/\/arxiv.org\/abs\/1910.01108\">This paper<\/a> proposes a way to pre-train a smaller general-purpose language representation model, known as DistilBERT \u2014 a distilled version of BERT. The architecture of DistilBERT is similar to that of BERT.<\/p>\n<p id=\"41fd\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The performance of this approach compared to BERT is shown below.<\/p>\n<figure class=\"mg mh mi mj mk mf mq mr paragraph-image\">\n<div class=\"pg ph eb pi bg pj\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png\" alt=\"\" width=\"700\" height=\"398\"><\/figure><div class=\"mq mr rh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*DZ9xLvGBcxQ_HVQtyc7fGg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mn mo mp mq mr ms mt be b bf z dv\" data-selectable-paragraph=\"\"><a class=\"af mu\" href=\"https:\/\/arxiv.org\/pdf\/1910.01108.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">source<\/a><\/figcaption>\n<\/figure>\n<p id=\"7ce0\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">DistilBERT is distilled on very large batches leveraging gradient accumulation, using dynamic masking and without the next sentence prediction objective. It\u2019s trained on the original corpus of the BERT model and was assessed on the General Language Understanding Evaluation (GLUE) benchmark. DistilBERT retains 97% the performance of BERT and is 60% faster.<\/p>\n<h1 id=\"bb39\" class=\"nq nr fo be ns nt nu go nv nw nx gr ny nz oa ob oc od oe of og oh oi oj ok ol bj\" data-selectable-paragraph=\"\">Conclusion<\/h1>\n<p id=\"209e\" class=\"pw-post-body-paragraph mv mw fo be b gm om my mz gp on nb nc nd oo nf ng nh op nj nk nl oq nn no np fh bj\" data-selectable-paragraph=\"\">We should now be up to speed on some of the most common \u2014 and a couple of very recent \u2014 model distillation methods.<\/p>\n<p id=\"34d0\" class=\"pw-post-body-paragraph mv mw fo be b gm mx my mz gp na nb nc nd ne nf ng nh ni nj nk nl nm nn no np fh bj\" data-selectable-paragraph=\"\">The papers\/abstracts mentioned and linked to above also contain links to their code implementations. We\u2019d be happy to see the results you obtain after testing them.<\/p>\n<p data-selectable-paragraph=\"\"><a href=\"https:\/\/www.udemy.com\/course\/data-science-bootcamp-in-python\/?referralCode=9F6DFBC3F92C44E8C7F4&amp;source=post_page-----4a100801c0eb--------------------------------\">The Data Science Bootcamp in Python<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Image Source Knowledge distillation is a model compression technique whereby a small network (student) is taught by a larger trained neural network (teacher). The smaller network is trained to behave like the large neural network. This enables the deployment of such models on small devices such as mobile phones or other edge devices. In this [&hellip;]<\/p>\n","protected":false},"author":63,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6],"tags":[],"coauthors":[163],"class_list":["post-7258","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Model Distillation Techniques for Deep Learning<\/title>\n<meta name=\"description\" content=\"In this guide, look at a couple of papers that discuss model distillation and knowledge distillation. Read more.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Research Guide: Model Distillation Techniques for Deep Learning\" \/>\n<meta property=\"og:description\" content=\"In this guide, look at a couple of papers that discuss model distillation and knowledge distillation. Read more.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-21T17:01:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:14:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg\" \/>\n<meta name=\"author\" content=\"Derrick Mwiti\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Derrick Mwiti\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Model Distillation Techniques for Deep Learning","description":"In this guide, look at a couple of papers that discuss model distillation and knowledge distillation. Read more.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/","og_locale":"en_US","og_type":"article","og_title":"Research Guide: Model Distillation Techniques for Deep Learning","og_description":"In this guide, look at a couple of papers that discuss model distillation and knowledge distillation. Read more.","og_url":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-21T17:01:47+00:00","article_modified_time":"2025-04-24T17:14:41+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg","type":"","width":"","height":""}],"author":"Derrick Mwiti","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Derrick Mwiti","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/"},"author":{"name":"Derrick Mwiti","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6"},"headline":"Research Guide: Model Distillation Techniques for Deep Learning","datePublished":"2023-08-21T17:01:47+00:00","dateModified":"2025-04-24T17:14:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/"},"wordCount":1395,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/","url":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/","name":"Model Distillation Techniques for Deep Learning","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg","datePublished":"2023-08-21T17:01:47+00:00","dateModified":"2025-04-24T17:14:41+00:00","description":"In this guide, look at a couple of papers that discuss model distillation and knowledge distillation. Read more.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*Xoll0HT4YUfF_DhJZqiiuA.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/research-guide-model-distillation-techniques-for-deep-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Research Guide: Model Distillation Techniques for Deep Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9808205cca68ec95b6fbd918d195cea6","name":"Derrick Mwiti","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/b7db96aa11f77239bbde5eb79ede1493","url":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d52d009e8d0a72c0dcd785caadeefbb3fb7aa64567e9f5a1e65f5faad18f2426?s=96&d=mm&r=g","caption":"Derrick Mwiti"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/mwitiderrickgmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7258"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7258\/revisions"}],"predecessor-version":[{"id":15576,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7258\/revisions\/15576"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7258"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}