{"id":1854,"date":"2019-04-15T19:36:06","date_gmt":"2019-04-16T03:36:06","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/blog\/approach-pre-trained-deep-learning-models-with-caution\/"},"modified":"2019-04-15T19:36:06","modified_gmt":"2019-04-16T03:36:06","slug":"approach-pre-trained-deep-learning-models-with-caution","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/","title":{"rendered":"Approach pre-trained deep learning models with caution"},"content":{"rendered":"\n<p>&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Pre-trained models are easy to use, but are you glossing over details that could impact your model performance?<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">How many times have you run the following snippets:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import torchvision.models as models\ninception = models.inception_v3(pretrained=True)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">or<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from keras.applications.inception_v3 import InceptionV3\nbase_model = InceptionV3(weights='imagenet', include_top=False)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">It seems like using these pre-trained models have become a new standard for industry best practices. After all, why\u00a0<em>wouldn\u2019t<\/em>\u00a0you take advantage of a model that\u2019s been trained on more data and compute than you could ever muster by yourself?<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>See the discussion on\u00a0<\/em><a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/bdjxf2\/discussion_be_careful_when_using_pretrained_deep\/\" target=\"_blank\" rel=\"noreferrer noopener\">Reddit<\/a><em>\u00a0and\u00a0<\/em><a href=\"https:\/\/news.ycombinator.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">HackerNews<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Long live pre-trained models!<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There are several substantial benefits to leveraging pre-trained models:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>super simple to incorporate<\/li>\n<li>achieve solid (same or even better) model performance quickly<\/li>\n<li>there\u2019s not as much labeled data required<\/li>\n<li>versatile uses cases from transfer learning, prediction, and feature extraction<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Advances within the NLP space have also encouraged the use of pre-trained language models like\u00a0<a href=\"https:\/\/github.com\/openai\/gpt-2\" target=\"_blank\" rel=\"noreferrer noopener\">GPT and GPT-2<\/a>, AllenNLP\u2019s\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1802.05365\" target=\"_blank\" rel=\"noreferrer noopener\">ELMo<\/a>, Google\u2019s\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1810.04805.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">BERT<\/a>, and Sebastian Ruder and Jeremy Howard\u2019s\u00a0<a href=\"http:\/\/nlp.fast.ai\/classification\/2018\/05\/15\/introducting-ulmfit.html\" target=\"_blank\" rel=\"noreferrer noopener\">ULMFiT<\/a>\u00a0(for an excellent over of these models, see\u00a0<a href=\"https:\/\/www.topbots.com\/ai-nlp-research-pretrained-language-models\/\" target=\"_blank\" rel=\"noreferrer noopener\">this TOPBOTs post<\/a>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One common technique for leveraging pretrained models is feature extraction, where you\u2019re retrieving intermediate representations produced by the pretrained model and using those representations as inputs for a new model. These final fully-connected layers are generally assumed to capture information that is relevant for solving a new task.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Everyone\u2019s in on the game<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Every major framework like Tensorflow, Keras, PyTorch, MXNet, etc\u2026offers pre-trained models like Inception V3, ResNet, AlexNet with weights:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/keras.io\/applications\/\" target=\"_blank\" rel=\"noreferrer noopener\">Keras Applications<\/a><\/li>\n<li><a href=\"https:\/\/pytorch.org\/docs\/stable\/torchvision\/models.html\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch torchvision.models<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/tensorflow\/models\/tree\/master\/official\" target=\"_blank\" rel=\"noreferrer noopener\">Tensorflow Official Models\u00a0<\/a>(and now\u00a0<a href=\"https:\/\/www.tensorflow.org\/hub\" target=\"_blank\" rel=\"noreferrer noopener\">TensorFlow Hubs<\/a>)<\/li>\n<li><a href=\"https:\/\/mxnet.apache.org\/model_zoo\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">MXNet Model Zoo<\/a><\/li>\n<li><a href=\"https:\/\/docs.fast.ai\/applications.html\" target=\"_blank\" rel=\"noreferrer noopener\">Fast.ai Applications<\/a><\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img decoding=\"async\" class=\"wp-image-1082\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy-1-1024x576-1.jpg\" alt=\"\" \/>\n<figcaption>Easy, right?<\/figcaption>\n<\/figure>\n<\/div>\n\n\n<hr class=\"wp-block-separator is-style-dots\" \/>\n\n\n<h2 class=\"wp-block-heading\">But are these benchmarks reproducible?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The article that inspired this post came from\u00a0<a href=\"http:\/\/www.curtisnorthcutt.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Curtis Northcutt<\/a>, a computer science PhD candidate at MIT.\u00a0<strong>His article \u2018<\/strong><a href=\"http:\/\/l7.curtisnorthcutt.com\/towards-reproducibility-benchmarking-keras-pytorch\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Towards Reproducibility: Benchmarking Keras and PyTorch<\/strong><\/a><strong>\u2019 made several interesting claims\u00a0<\/strong>\u2014<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><code>resnet<\/code>\u00a0architectures perform better in PyTorch and\u00a0<code>inception<\/code>\u00a0architectures perform better in Keras<\/li>\n<li>The\u00a0<a href=\"https:\/\/keras.io\/applications\/#documentation-for-individual-models\" target=\"_blank\" rel=\"noreferrer noopener\">published benchmarks<\/a>\u00a0on\u00a0<a href=\"https:\/\/keras.io\/applications\/\" target=\"_blank\" rel=\"noreferrer noopener\">Keras Applications<\/a>\u00a0cannot be reproduced, even when exactly copying the example code. In fact, their reported accuracies (as of Feb. 2019) are usually higher than the actual accuracies (citing\u00a0<a href=\"https:\/\/github.com\/keras-team\/keras\/issues\/10040\" target=\"_blank\" rel=\"noreferrer noopener\">1<\/a>\u00a0and\u00a0<a href=\"https:\/\/github.com\/keras-team\/keras\/issues\/8672\" target=\"_blank\" rel=\"noreferrer noopener\">2<\/a>)<\/li>\n<li>Some pre-trained Keras models yield inconsistent or lower accuracies when deployed on a server (<a href=\"https:\/\/github.com\/keras-team\/keras\/issues\/7848\" target=\"_blank\" rel=\"noreferrer noopener\">3<\/a>) or run in sequence with other Keras models (<a href=\"https:\/\/github.com\/keras-team\/keras\/issues\/10979\" target=\"_blank\" rel=\"noreferrer noopener\">4<\/a>)<\/li>\n<li>Keras models using batch normalization can be unreliable. For some models, forward-pass evaluations (with gradients supposedly off) still result in weights changing at inference time. (See\u00a0<a href=\"http:\/\/blog.datumbox.com\/the-batch-normalization-layer-of-keras-is-broken\/\" target=\"_blank\" rel=\"noreferrer noopener\">5<\/a>)<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">You might be wondering:\u00a0<strong>How is that possible? Aren\u2019t these the same model and shouldn\u2019t they have the same performance if trained with the same conditions?<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Well, you\u2019re not alone. Curtis\u2019 article also sparked some reactions on Twitter:<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">https:\/\/twitter.com\/yoavgo\/status\/1116582046145531909?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwgr%5E363937393b70726f64756374696f6e&amp;ref_url=https%3A%2F%2Fcdn.embedly.com%2Fwidgets%2Fmedia.html%3Ftype%3Dtext%252Fhtml%26key%3Da19fcc184b9711e1b4764040d3dc5c07%26schema%3Dtwitter%26url%3Dhttps%253A%2F%2Ftwitter.com%2Fyoavgo%2Fstatus%2F1116582046145531909%26image%3Dhttps%253A%2F%2Fi.embed.ly%2F1%2Fimage%253Furl%253Dhttps%25253A%25252F%25252Fpbs.twimg.com%25252Fprofile_images%25252F1431395997%25252Fprofile_400x400.jpg%2526key%253Da19fcc184b9711e1b4764040d3dc5c07<\/div>\n<\/figure>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">https:\/\/twitter.com\/deliprao\/status\/1116545913558724609?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwgr%5E363937393b70726f64756374696f6e&amp;ref_url=https%3A%2F%2Fcdn.embedly.com%2Fwidgets%2Fmedia.html%3Ftype%3Dtext%252Fhtml%26key%3Da19fcc184b9711e1b4764040d3dc5c07%26schema%3Dtwitter%26url%3Dhttps%253A%2F%2Ftwitter.com%2Fdeliprao%2Fstatus%2F1116545913558724609%26image%3Dhttps%253A%2F%2Fi.embed.ly%2F1%2Fimage%253Furl%253Dhttps%25253A%25252F%25252Fpbs.twimg.com%25252Fprofile_images%25252F2252894279%25252Fimage_400x400.jpg%2526key%253Da19fcc184b9711e1b4764040d3dc5c07<\/div>\n<\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">and some interesting insights into the reason for these differences:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">https:\/\/twitter.com\/abursuc\/status\/1116639605569269760?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwgr%5E363937393b70726f64756374696f6e&amp;ref_url=https%3A%2F%2Fcdn.embedly.com%2Fwidgets%2Fmedia.html%3Ftype%3Dtext%252Fhtml%26key%3Da19fcc184b9711e1b4764040d3dc5c07%26schema%3Dtwitter%26url%3Dhttps%253A%2F%2Ftwitter.com%2Fabursuc%2Fstatus%2F1116639605569269760%26image%3Dhttps%253A%2F%2Fi.embed.ly%2F1%2Fimage%253Furl%253Dhttps%25253A%25252F%25252Fpbs.twimg.com%25252Fprofile_images%25252F458905216025255936%25252FXsMRlSXz_400x400.jpeg%2526key%253Da19fcc184b9711e1b4764040d3dc5c07<\/div>\n<\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Knowing (and trusting) these benchmarks are important because they allow you to make informed decisions around which framework to use and are often used as baselines for research and implementation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So what are some things to look out for when you\u2019re leveraging these pre-trained models?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Considerations for using pre-trained models<\/h2>\n\n\n\n<h4 class=\"wp-block-heading\">1. <strong>How similar is your task? How similar is your data?<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Are you expecting that cited 0.945% validation accuracy for the Keras Xception model you\u2019re using with your new dataset of x-rays? First, you need to check how similar your data is to the original dataset that the model was trained on (in this case: ImageNet). You also need to be aware of where the features have been transferred from (the bottom, middle, or top of the network) because that will impact model performance depending on task similarity.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Read\u00a0<a href=\"http:\/\/cs231n.github.io\/transfer-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">CS231n \u2014 Transfer Learning<\/a>\u00a0and \u2018<a href=\"https:\/\/papers.nips.cc\/paper\/5347-how-transferable-are-features-in-deep-neural-networks.pdf%20\/\" target=\"_blank\" rel=\"noreferrer noopener\">How transferable are features in deep neural networks?<\/a>\u2019<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2. <strong>How did you preprocess the data?<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Your model\u2019s pre-processing should be the same as the original model\u2019s training. With almost all torchvision models, they use the same pre-processing values. For\u00a0<a href=\"https:\/\/papers.nips.cc\/paper\/5347-how-transferable-are-features-in-deep-neural-networks.pdf%20\/\" target=\"_blank\" rel=\"noreferrer noopener\">Keras models<\/a>, you should always use the\u00a0<code>preprocess_input<\/code>\u00a0function for the corresponding model-level module. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># VGG16\nkeras.applications.vgg16.preprocess_input\n\n# InceptionV3\nkeras.applications.inception_v3.preprocess_input\n\n#ResNet50\nkeras.applications.resnet50.preprocess_input <\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. What\u2019s your backend?<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">There were\u00a0<a href=\"https:\/\/news.ycombinator.com\/item?id=14470967\" target=\"_blank\" rel=\"noreferrer noopener\">some rumblings on HackerNews<\/a>\u00a0that changing the Keras\u2019 backend from Tensorflow to CNTK (Microsoft Cognitive toolkit) improved the performance. Since Keras is a model-level library, it does not handle lower-level operations such as tensor products, convolutions, etc\u2026so it relies on\u00a0<a href=\"http:\/\/faroit.com\/keras-docs\/1.2.0\/backend\/\" target=\"_blank\" rel=\"noreferrer noopener\">other tensor manipulation frameworks<\/a>\u00a0like the TensorFlow backend and the Theano backend.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Max Woolf provided\u00a0<a href=\"https:\/\/minimaxir.com\/2017\/06\/keras-cntk\/\" target=\"_blank\" rel=\"noreferrer noopener\">an excellent benchmarking project<\/a>\u00a0that found that while accuracy was the same between CNTK and Tensorflow, CNTK was faster at LSTMs and Multilayer Perceptions (MLPs) while Tensorflow was faster at CNNs and embeddings.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Woolf\u2019s post is from 2017, so It\u2019d be interesting to get an updated comparison that also includes Theano and\u00a0<\/em><a href=\"https:\/\/medium.com\/apache-mxnet\/keras-gets-a-speedy-new-backend-with-keras-mxnet-3a853efc1d75\" target=\"_blank\" rel=\"noreferrer noopener\">MXNet as a backend\u00a0<\/a><em>(although Theano is now deprecated).<\/em><\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">There are also some claims that there are certain versions of Theano that may ignore your seed (for a relevant post form Keras, see\u00a0<a href=\"https:\/\/keras.io\/getting-started\/faq\/#how-can-i-obtain-reproducible-results-using-keras-during-development\" target=\"_blank\" rel=\"noreferrer noopener\">this<\/a>)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>4. What\u2019s your hardware?<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Are you using an Amazon EC2 NVIDIA Tesla K80 or a Google Compute NVIDIA Tesla P100? Maybe even a TPU? Check out these useful benchmark resources for run times for these different pretrained models.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/medium.com\/u\/480c87811f42?source=post_page-----9f0ff739010c----------------------\" target=\"_blank\" rel=\"noreferrer noopener\">Apache MXNet<\/a>\u2019s\u00a0<a href=\"https:\/\/medium.com\/apache-mxnet\/gluon-nlp-bert-6a489bdd3340\" target=\"_blank\" rel=\"noreferrer noopener\">GluonNLP 0.6:Closing the Gap in Reproducible Research with BERT<\/a><\/li>\n<li>Caleb Robinson\u2019s \u2018<a href=\"http:\/\/calebrob.com\/ml\/imagenet\/ilsvrc2012\/2018\/10\/22\/imagenet-benchmarking.html\" target=\"_blank\" rel=\"noreferrer noopener\">How to reproduce ImageNet validation results<\/a>\u2019 (and of course, again, Curtis\u2019\u00a0<a href=\"http:\/\/l7.curtisnorthcutt.com\/towards-reproducibility-benchmarking-keras-pytorch\" target=\"_blank\" rel=\"noreferrer noopener\">benchmarking post<\/a>)<\/li>\n<li><a href=\"http:\/\/dlbench.comp.hkbu.edu.hk\/\" target=\"_blank\" rel=\"noreferrer noopener\">DL Bench<\/a><\/li>\n<li><a href=\"https:\/\/dawn.cs.stanford.edu\/2018\/06\/19\/dawnbench-analysis\/\" target=\"_blank\" rel=\"noreferrer noopener\">Stanford DAWNBench<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/guide\/performance\/benchmarks\" target=\"_blank\" rel=\"noreferrer noopener\">TensorFlow\u2019s performance benchmarks<\/a><\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>5. What\u2019s your learning rate?<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In practice, you should either keep the pre-trained parameters fixed (ie. use the pre-trained models as feature extractors) as or tune them with a fairly small learning in order to not unlearn everything in the original model.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>6. Is there a difference in how you use optimizations like batch normalization or dropout, especially between training mode and inference mode?<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">As Curtis\u2019 post claims:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Keras models using batch normalization can be unreliable. For some models, forward-pass evaluations (with gradients supposedly off) still result in weights changing at inference time. (See\u00a0<\/em><a href=\"http:\/\/blog.datumbox.com\/the-batch-normalization-layer-of-keras-is-broken\/\" target=\"_blank\" rel=\"noreferrer noopener\">5<\/a><em>)<\/em><\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">But\u00a0<em>why\u00a0<\/em>is this the case?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">According to Vasilis Vryniotis, Principal Data Scientist at Expedia, who first identified the issue with the frozen batch normalization layer in Keras (see Vasilis\u2019 PR\u00a0<a href=\"https:\/\/github.com\/keras-team\/keras\/pull\/9965\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>\u00a0and detailed blog post\u00a0<a href=\"http:\/\/blog.datumbox.com\/the-batch-normalization-layer-of-keras-is-broken\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>):<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>The problem with the current implementation of Keras is that when a batch normalization (BN) layer is frozen, it continues to use the mini-batch statistics during training. I believe a better approach when the BN is frozen is to use the moving mean and variance that it learned during training. Why? For the same reasons why the mini-batch statistics should not be updated when the layer is frozen: it can lead to poor results because the next layers are not trained properly.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Vasilis also cited instances where this discrepancy led to significant drops in model performance (\u201cfrom 100% down to 50% accuracy) when the Keras model is switched from train mode to test mode.<\/p>\n\n\n<hr class=\"wp-block-separator is-style-dots\" \/>\n\n\n<p class=\"wp-block-paragraph\">Use these questions to guide how you interact with pre-trained models for your next project. Have comments, questions, or additions? Comment below!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Pre-trained models are easy to use, but are you glossing over details that could impact your model performance? How many times have you run the following snippets: or It seems like using these pre-trained models have become a new standard for industry best practices. After all, why\u00a0wouldn\u2019t\u00a0you take advantage of a model that\u2019s been [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1856,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7],"tags":[],"coauthors":[107],"class_list":["post-1854","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Approach pre-trained deep learning models with caution - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Approach pre-trained deep learning models with caution\" \/>\n<meta property=\"og:description\" content=\"&nbsp; Pre-trained models are easy to use, but are you glossing over details that could impact your model performance? How many times have you run the following snippets: or It seems like using these pre-trained models have become a new standard for industry best practices. After all, why\u00a0wouldn\u2019t\u00a0you take advantage of a model that\u2019s been [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2019-04-16T03:36:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Gideon Mendels\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gideon Mendels\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Approach pre-trained deep learning models with caution - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/","og_locale":"en_US","og_type":"article","og_title":"Approach pre-trained deep learning models with caution","og_description":"&nbsp; Pre-trained models are easy to use, but are you glossing over details that could impact your model performance? How many times have you run the following snippets: or It seems like using these pre-trained models have become a new standard for industry best practices. After all, why\u00a0wouldn\u2019t\u00a0you take advantage of a model that\u2019s been [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2019-04-16T03:36:06+00:00","og_image":[{"width":1280,"height":720,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg","type":"image\/jpeg"}],"author":"Gideon Mendels","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Gideon Mendels","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/"},"author":{"name":"engineering@atre.net","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b"},"headline":"Approach pre-trained deep learning models with caution","datePublished":"2019-04-16T03:36:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/"},"wordCount":1379,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg","articleSection":["Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/","url":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/","name":"Approach pre-trained deep learning models with caution - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg","datePublished":"2019-04-16T03:36:06+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg","width":1280,"height":720,"caption":"Easy Button"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/approach-pre-trained-deep-learning-models-with-caution\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Approach pre-trained deep learning models with caution"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b","name":"engineering@atre.net","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/027c18177377edf459980f0cfb83706c","url":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","caption":"engineering@atre.net"},"sameAs":["https:\/\/live-cometml.pantheonsite.io"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/engineeringatre-net\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/easy.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/1854","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=1854"}],"version-history":[{"count":0,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/1854\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/1856"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=1854"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=1854"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=1854"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=1854"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}