{"id":7147,"date":"2023-08-18T11:57:56","date_gmt":"2023-08-18T19:57:56","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7147"},"modified":"2025-04-29T15:20:13","modified_gmt":"2025-04-29T15:20:13","slug":"image-inpainting-for-sdxl-1-0-base-refiner","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/","title":{"rendered":"Image Inpainting for SDXL 1.0 Base Model + Refiner"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In this article, we\u2019ll compare the results of SDXL 1.0 with its predecessor, Stable Diffusion 2.0. We\u2019ll also take a look at the role of the refiner model in the new SDXL ensemble-of-experts pipeline and compare outputs using dilated and un-dilated segmentation masks. Finally, we\u2019ll use Comet to organize all of our data and metrics. Feel free to follow along in the <\/span><strong><a href=\"https:\/\/colab.research.google.com\/drive\/17HTh_A-NWCVpPdxw8KJVLpgko8FZ6OQh\">full-code tutorial here<\/a><\/strong><span style=\"font-weight: 400;\">, or, if you can\u2019t wait to see the final product, check out <\/span><strong><a href=\"https:\/\/www.comet.com\/examples\/demo-inpainting-sdxl-refiner\/view\/xjFTjQ4C3jIzmIwrhVC10vurE\/panels\">the public project here<\/a><\/strong><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7192\"><img loading=\"lazy\" decoding=\"async\" width=\"1894\" height=\"1760\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-9.13.09-PM.png\" alt=\"Realistic picture of an American astronaut in outerspace with planets and stars and meteorites and meteors in the background and a light halo around the astronaut. In the astronaut's helmet is the reflection of a fiery orange comet. &quot;SDXL&quot; is outlined in white in the background.\" class=\"wp-image-7192\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-9.13.09-PM.png 1894w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-9.13.09-PM-300x279.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-9.13.09-PM-1024x952.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-9.13.09-PM-768x714.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-9.13.09-PM-1536x1427.png 1536w\" sizes=\"auto, (max-width: 1894px) 100vw, 1894px\" \/><figcaption class=\"wp-element-caption\">Image created by author with SDXL 1.0 base + refiner and edited in Canva; seed = -732, prompt = &#8220;astronaut in outerspace, photorealistic.&#8221;<\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/colab.research.google.com\/drive\/17HTh_A-NWCVpPdxw8KJVLpgko8FZ6OQh\" target=\"_blank\" rel=\"noreferrer noopener\">Follow along with the Colab!<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"\/signup\/?utm_source=SDXL_blog&amp;utm_medium=referral&amp;utm_content=Medium\" target=\"_blank\" rel=\"noreferrer noopener\">Create a free Comet account!<\/a><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What is SDXL 1.0 and why should I care?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">SDXL 1.0 is the new foundational model from Stability AI that\u2019s making waves as a <\/span><a href=\"https:\/\/arxiv.org\/pdf\/2307.01952.pdf\"><span style=\"font-weight: 400;\">drastically-improved<\/span><\/a><span style=\"font-weight: 400;\"> version of Stable Diffusion, a latent diffusion model (LDM) for text-to-image synthesis. As the newest evolution of Stable Diffusion, it\u2019s blowing its <\/span><a href=\"https:\/\/stability.ai\/blog\/sdxl-09-stable-diffusion\"><span style=\"font-weight: 400;\">predecessors<\/span><\/a><span style=\"font-weight: 400;\"> out of the water and producing images that are competitive with black-box SOTA image generators like Midjourney.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7177\"><img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"600\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/SDXL-bar-chart2.png\" alt=\"A vertical bar chart showing the preference rates of users between SDXL 1.0 (base + refiner), SDXL 1.0 (base only) Stable Diffusion 1.5 and Stable Diffusion 2.0\" class=\"wp-image-7177\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/SDXL-bar-chart2.png 900w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/SDXL-bar-chart2-300x200.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/SDXL-bar-chart2-768x512.png 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><figcaption class=\"wp-element-caption\">The results are in and practitioners prefer the SDXL 1.0 pipeline by a wide margin!<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The improvements are the result of a series of intentional design choices, including a 3x larger UNet-backbone, more powerful pre-trained text encoders, and the introduction of a separate, diffusion-based refinement model. The refiner model improves the visual fidelity of samples using a post hoc image-to-image diffusion technique first proposed in <\/span><a href=\"https:\/\/arxiv.org\/abs\/2108.01073\"><span style=\"font-weight: 400;\">SDEdit<\/span><\/a><span style=\"font-weight: 400;\">. In this tutorial, we\u2019ll use SDXL with and without this refinement model to get a better understanding of its role in the pipeline. We\u2019ll also compare these results with outputs from Stable Diffusion 2.0 to get a broader picture of the improvements introduced in SDXL.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7180\"><img loading=\"lazy\" decoding=\"async\" width=\"2478\" height=\"1264\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic.png\" alt=\"Graphic of the SDXL 1.0 diffusion model architecture, including the base model and refiner model.\" class=\"wp-image-7180\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic.png 2478w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic-300x153.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic-1024x522.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic-768x392.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic-1536x783.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/fixed-alignment-sdxl-graphic-2048x1045.png 2048w\" sizes=\"auto, (max-width: 2478px) 100vw, 2478px\" \/><figcaption class=\"wp-element-caption\">The SDXL 1.0 mixture-of-experts pipeline includes both a base model and a refinement model.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">But these improvements do come at a cost; SDXL 1.0 involves an impressive 3.5B parameter base model and a 6.6B parameter refiner model, making it one of the largest open image generators today. This increase is mainly due to more attention blocks and a larger cross-attention context, since SDXL uses a second text encoder (<a href=\"https:\/\/huggingface.co\/laion\/CLIP-ViT-bigG-14-laion2B-39B-b160k\">OpenCLIP ViT-bigG<\/a> with <a href=\"https:\/\/huggingface.co\/openai\/clip-vit-large-patch14\">CLIP ViT-L<\/a>).&nbsp;<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">SDXL 1.0 and the future<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Still, the announcement is an exciting one! In the last year alone, Stable Diffusion has served as a foundational model in fields spanning <a href=\"https:\/\/arxiv.org\/abs\/2305.15957\">3D classification<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2302.05543\">controllable image editing<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2208.01618\">image personalization<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2211.01777\">synthetic data augmentation<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2306.06233\">graphical user interface prototyping<\/a>, <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2022.11.18.517004v3\">reconstructing images from fMRI brain scans<\/a>, and <a href=\"https:\/\/www.riffusion.com\/about\">music generation<\/a>. SDXL 1.0 promises to continue Stable Diffusion\u2019s tradition of widening the realm of generative AI possibilities.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7235\"><img loading=\"lazy\" decoding=\"async\" width=\"2670\" height=\"854\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM.png\" alt=\"In one incredible application of Stable Diffusion, researchers showed volunteers the top row of images above. They then fed fMRI scans of their brain activity to a Stable Diffusion model, which successfully created the bottom row of reconstruction images.\" class=\"wp-image-7235\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM.png 2670w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM-300x96.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM-1024x328.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM-768x246.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM-1536x491.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-19-at-8.32.56-PM-2048x655.png 2048w\" sizes=\"auto, (max-width: 2670px) 100vw, 2670px\" \/><figcaption class=\"wp-element-caption\">In <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2022.11.18.517004v3.full.pdf\">one incredible application of a previous version of Stable Diffusion<\/a>, researchers showed volunteers the top row of images above. They then fed fMRI scans of their brain activity to the model, which successfully created the bottom row of reconstruction images. What new advancements will SDXL bring?<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">SDXL 1.0 vs. the world<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">So, how does SDXL 1.0 fare in the wider world of text-to-image generative AI tools? According to SDXL, very well. In fact, it\u2019s now considered the world\u2019s best open image generation model. And while Midjourney still seems to have an edge as the crowd favorite, SDXL is certainly giving it a run for its money as a free open source alternative.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">SDXL 1.0 is both open-source and open-access, meaning it\u2019s free to use, as long as you have the computational resources to do so. But it doesn\u2019t require much; all of the images in this article were generated using Google Colab\u2019s A100 GPU. And according to Stability AI, SDXL 1.0 will even work effectively on consumer GPUs with just 8GB of VRAM, making generative text-to-image models more accessible than ever before.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">But what makes SDXL image outputs better than ever? According to Stability AI, SDXL offers:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">Better contrast, lighting and shadows<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">More vibrant and accurate colors<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Native 1024 x 1024 resolution<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Capacity to create legible text<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Better human anatomy (hands, feet, limbs and faces)<\/span><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We\u2019ll explore some of these points in more detail below. <\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">SDXL 1.0 for Model Explainability<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Generative AI continues to find itself at the forefront of debates surrounding model explainability, transparency, and reproducibility. As AI becomes more advanced, model decisions can become nearly impossible to interpret, even by the engineers and researchers that create them. This is a particular concern with many state-of-the-art (SOTA) generative AI models, whose opacity limits our ability to wholly assess their performance, potential biases, and inherent limitations. So it comes as a commendable move towards model explainability and transparency that Stability AI has made SDXL an open model.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7233\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/277-_-machine-learning-model-explainability-in-the-style-of-a-medical-poster.png\" alt=\"Image created using the SDXL 1.0 Stable Diffusion pipeline (base + refiner) of machine learning model explainability\" class=\"wp-image-7233\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/277-_-machine-learning-model-explainability-in-the-style-of-a-medical-poster.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/277-_-machine-learning-model-explainability-in-the-style-of-a-medical-poster-300x300.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/277-_-machine-learning-model-explainability-in-the-style-of-a-medical-poster-150x150.png 150w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/277-_-machine-learning-model-explainability-in-the-style-of-a-medical-poster-768x768.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Image created by author with SDXL base + refiner; seed = 277, prompt = &#8220;machine learning model explainability, in the style of a medical poster&#8221;<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">A lack of model explainability can lead to a whole host of unintended consequences, like perpetuation of bias and stereotypes, distrust in organizational decision-making, and even legal ramifications. What\u2019s more, it hinders reproducibility, discourages collaboration, and restricts further progress. The decision to make Stable Diffusion models open source and open access follows a growing trend in the industry towards open artificial intelligence, which encourages practitioners to build upon existing work and contribute new insights.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Now, let\u2019s try it for ourselves!<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SDXL 1.0 in action<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">If you aren\u2019t already, you can follow along with the full code in <a href=\"https:\/\/colab.research.google.com\/drive\/17HTh_A-NWCVpPdxw8KJVLpgko8FZ6OQh\">this Colab here<\/a>. I should note that the code in this tutorial extends a pipeline I built in a <\/span><a href=\"https:\/\/www.comet.com\/site\/blog\/sam-stable-diffusion-for-text-to-image-inpainting\/\"><span style=\"font-weight: 400;\">previous article<\/span><\/a><span style=\"font-weight: 400;\"> (which you can find <\/span><a href=\"https:\/\/colab.research.google.com\/drive\/1B7L4cork9UFTtIB02EntjiZRLYuqJS2b#scrollTo=LtZghyHoJabf\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">) but will also run as a standalone project if you\u2019re eager to get started. To compare the performance of SDXL with Stable Diffusion 2.0, I\u2019ll use the same images used in that tutorial. Because this article is not focused on image segmentation, I\u2019ll also use the binary masks and image metadata generated in that project, which I uploaded to Comet as an Artifact for public use.<\/span><\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Although this tutorial functions as a standalone project, if you\u2019re interested in how we created our initial segmentation masks and metadata, you can check out the first part of this project here:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/www.comet.com\/site\/blog\/sam-stable-diffusion-for-text-to-image-inpainting\/\"><img loading=\"lazy\" decoding=\"async\" width=\"2294\" height=\"530\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM.png\" alt=\"Preceding article, &quot;SAM + Stable Diffusion for Text-to-Image-Inpainting&quot; from Comet ML\" class=\"wp-image-7239\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM.png 2294w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM-300x69.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM-1024x237.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM-768x177.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM-1536x355.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-20-at-9.47.38-AM-2048x473.png 2048w\" sizes=\"auto, (max-width: 2294px) 100vw, 2294px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">After we download the Artifact, we\u2019ll perform image inpainting and outpainting using the <\/span><a href=\"https:\/\/colab.research.google.com\/drive\/1B7L4cork9UFTtIB02EntjiZRLYuqJS2b#scrollTo=LtZghyHoJabf\"><span style=\"font-weight: 400;\">SDXL Inpainting Pipeline<\/span><\/a><span style=\"font-weight: 400;\"> from HuggingFace. We\u2019ll use near-identical prompts to those used in the first part of this tutorial (with a few very minor exceptions).<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7182 size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1738\" height=\"796\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.30.19-PM.png\" alt=\"A graphic showing an original image, the segmentation mask of a frog, and the resulting inpatined image from the SDXL 1.0 diffusion pipeline.\" class=\"wp-image-7182\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.30.19-PM.png 1738w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.30.19-PM-300x137.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.30.19-PM-1024x469.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.30.19-PM-768x352.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.30.19-PM-1536x703.png 1536w\" sizes=\"auto, (max-width: 1738px) 100vw, 1738px\" \/><figcaption class=\"wp-element-caption\">Image inpainting refers to the process of filling-in missing data in a designated region of an image; graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">As a reminder, <\/span><a href=\"https:\/\/paperswithcode.com\/task\/image-inpainting\"><span style=\"font-weight: 400;\">image inpainting<\/span><\/a><span style=\"font-weight: 400;\"> is the process of filling in missing data in a designated region of an image. <\/span><a href=\"https:\/\/openai.com\/blog\/dall-e-introducing-outpainting\"><span style=\"font-weight: 400;\">Outpainting<\/span><\/a><span style=\"font-weight: 400;\"> is the process of extending an image beyond its original borders, which we&#8217;ll effectively do by inpainting the background masks of our images. The inpainting pipeline accepts both a positive and negative prompt, and we\u2019ll set random seeds so you can produce the same results in your local environment.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7183\"><img loading=\"lazy\" decoding=\"async\" width=\"2632\" height=\"750\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM.png\" alt=\"Graphic of image outpainting showing an original input image of a panda, a segmentation mask, and the inpainted image resulting from the SDXL 1.0 diffusion model.\" class=\"wp-image-7183\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM.png 2632w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM-300x85.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM-1024x292.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM-768x219.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM-1536x438.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.45.53-PM-2048x584.png 2048w\" sizes=\"auto, (max-width: 2632px) 100vw, 2632px\" \/><figcaption class=\"wp-element-caption\">Image outpainting is the process of using generative AI to extend images beyond their original borders, thereby generating parts of the image that didn\u2019t exist before; graphic by author.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Tracking our experiment<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We\u2019ll start off by instantiating a Comet Experiment so we can track our inputs, outputs, code, and other system metrics. You\u2019ll need to grab your API key from your <\/span><a href=\"https:\/\/www.comet.com\/account-settings\/apiKeys\"><span style=\"font-weight: 400;\">account settings<\/span><\/a><span style=\"font-weight: 400;\">. If you don\u2019t already have an account, you can <\/span><a href=\"\/signup\/?utm_source=SDXL_blog&amp;utm_medium=referral&amp;utm_content=Medium\"><span style=\"font-weight: 400;\">create one here for free<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<script src=\"https:\/\/gist.github.com\/anmorgan24\/2a6f493294d58b82b99552064a7e8cc2.js\"><\/script>\n\n\n\n<h3 class=\"wp-block-heading\">Comet Artifacts<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">An <\/span><a href=\"https:\/\/www.comet.com\/docs\/v2\/guides\/data-management\/using-artifacts\/\"><span style=\"font-weight: 400;\">Artifact<\/span><\/a><span style=\"font-weight: 400;\"> is any versioned object arranged in a folder-like structure. In this way, Comet allows you to keep track of any data associated with the machine learning lifecycle. Our Artifact will be structured as follows:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7199\"><img loading=\"lazy\" decoding=\"async\" width=\"1792\" height=\"650\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-3.37.20-PM.png\" alt=\"The file folder structure of our image information (including segmentation masks) that we will download as a Comet Artifact.\" class=\"wp-image-7199\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-3.37.20-PM.png 1792w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-3.37.20-PM-300x109.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-3.37.20-PM-1024x371.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-3.37.20-PM-768x279.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-3.37.20-PM-1536x557.png 1536w\" sizes=\"auto, (max-width: 1792px) 100vw, 1792px\" \/><figcaption class=\"wp-element-caption\">By logging our Artifact to Comet and downloading to our local environment, we preserve its file structure; graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Downloading an Artifact to your local environment is as simple as running the following code. If an Artifact isn\u2019t in your personal workspace, make sure the owner of the Artifact has shared it with you or made it public (as in our example). Below we download the Artifact to our working directory, preserving its original file structure without any additional parsing.<\/span><\/p>\n\n\n\n<script src=\"https:\/\/gist.github.com\/anmorgan24\/7fa2fd37a94890b69c5f0ef409d0440b.js\"><\/script>\n\n\n\n<h3 class=\"wp-block-heading\">Loading SDXL with Hugging Face<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We\u2019ll load the SDXL base model, refiner model, and inpainting pipelines from <\/span><a href=\"https:\/\/huggingface.co\/docs\/diffusers\/api\/pipelines\/stable_diffusion\/stable_diffusion_xl#stable-diffusion-xl\"><span style=\"font-weight: 400;\">HuggingFace<\/span><\/a><span style=\"font-weight: 400;\">. We can do so with the following code:<\/span><\/p>\n\n\n\n<script src=\"https:\/\/gist.github.com\/anmorgan24\/a432681d85bff0002cd136d219629151.js\"><\/script>\n\n\n\n<h3 class=\"wp-block-heading\">Hyperparameters<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Like any traditional machine learning model, SDXL has a variety of tunable hyperparameters that affect the quality of the output. We\u2019ll cover some of the important ones here and keep track of which sets of values produced which outputs using Comet.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Unlike many traditional machine learning models, however, \u201cquality\u201d in this experiment is mostly subjective. And while there are some metrics to \u201cobjectively\u201d assess the quality of images created by a generative model (see <\/span><a href=\"https:\/\/openaccess.thecvf.com\/content_WACV_2020\/papers\/Black_Evaluation_of_Image_Inpainting_for_Classification_and_Retrieval_WACV_2020_paper.pdf\"><span style=\"font-weight: 400;\">NRMSE, PSNR<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E102.D\/7\/E102.D_2018EDL8206\/_pdf\"><span style=\"font-weight: 400;\">SROCC and KROCC<\/span><\/a><span style=\"font-weight: 400;\">, ), there are issues with each of these metrics for inpainting, specifically. So, for this tutorial we\u2019ll just be using ourselves as human assessors. And because of this, we\u2019ll be relying heavily on our experiment tracking tool to trace which values led to which image outputs.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Let\u2019s take a look at some of the important hyperparameters we\u2019ll be setting.<\/span><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Guidance scale<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The guidance scale (also known as the <\/span><a href=\"https:\/\/arxiv.org\/abs\/2207.12598\"><span style=\"font-weight: 400;\">classifier-free guidance<\/span><\/a><span style=\"font-weight: 400;\"> scale or CFG scale) controls how similar the generated image will be to the original prompt. A lower guidance scale value allows the model more \u201ccreativity,\u201d but images may become unrecognizable if it is set too low. A higher guidance scale value forces the generator to more closely match the prompt, but sometimes at the cost of image quality or diversity. <\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7201\"><img loading=\"lazy\" decoding=\"async\" width=\"2518\" height=\"952\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale.png\" alt=\"Image of koala and octopus inpainted by SDXL over a range of guidance scale values\" class=\"wp-image-7201\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale.png 2518w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale-300x113.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale-1024x387.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale-768x290.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale-1536x581.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/guidance_scale-scale-2048x774.png 2048w\" sizes=\"auto, (max-width: 2518px) 100vw, 2518px\" \/><figcaption class=\"wp-element-caption\">The guidance scale controls how similar the generated image will be to the original prompt.; image by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Guidance scale values typically fall in the range of 7-15, though <\/span><a href=\"https:\/\/huggingface.co\/blog\/stable_diffusion#how-does-stable-diffusion-work\"><span style=\"font-weight: 400;\">Hugging Face suggests values between 7.5 and 8<\/span><\/a><span style=\"font-weight: 400;\">. For most images generated in this tutorial we will stick with the default value of 7.5.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7202\"><img loading=\"lazy\" decoding=\"async\" width=\"2682\" height=\"678\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels.png\" alt=\"A picture of an old man with curly gray hair, on the left, slightly happy, becoming happier as you move right and the guidance scale increases.\" class=\"wp-image-7202\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels.png 2682w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels-300x76.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels-1024x259.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels-768x194.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels-1536x388.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/gs-happy-with-labels-2048x518.png 2048w\" sizes=\"auto, (max-width: 2682px) 100vw, 2682px\" \/><figcaption class=\"wp-element-caption\">The role of the guidance scale value becomes even clearer once we start to introduce subjective words, like feelings. How happy should the \u201chappy man with curly hair\u201d be? Graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Most of our prompts are pretty objective (\u201ca purple octopus\u201d,\u201da red gummy bear\u201d). But the role of the guidance scale value becomes even clearer once we start to introduce subjective words, like feelings. How happy should the \u201chappy man with curly hair\u201d be? We can see the model takes more liberty with this prompt as the guidance scale increases. <\/span><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Number of inference steps<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In general, the quality of the image increases with the number of inference steps, but as we can see in the image below, at a certain point the improvements become negligible. More inference steps also means the model takes longer to generate the image, which could become an issue for certain use cases. Stable Diffusion works very well with relatively few inference steps, so in this tutorial we\u2019ll use around 70 inference steps per image.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7204\"><img loading=\"lazy\" decoding=\"async\" width=\"1808\" height=\"1362\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/num-inference-steps.png\" alt=\"Running SDXL 1.0 on different numbers of inference steps produces different results\" class=\"wp-image-7204\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/num-inference-steps.png 1808w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/num-inference-steps-300x226.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/num-inference-steps-1024x771.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/num-inference-steps-768x579.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/num-inference-steps-1536x1157.png 1536w\" sizes=\"auto, (max-width: 1808px) 100vw, 1808px\" \/><figcaption class=\"wp-element-caption\">The quality of the image generally increases with the number of inference steps, but more inference steps also means the model takes longer; graphic by author.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">High noise fraction<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">When using SDXL as a mixture-of-experts pipeline, as we are here, we\u2019ll also need to specify the high noise fraction. The high noise fraction is the percentage of inference steps to run in each stage of the base model and refiner model. The base model will always serve as the expert for the high-noise diffusion stage and the refiner as the expert for the low-noise diffusion stage.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We set these inference intervals using the <code>denoising_end<\/code> parameter of the base model or the <code>denoising_start<\/code> parameter of the refiner model. Both are not needed, as they will always sum to 1. Each accepts a float between 0 and 1, representing the fraction of total inference steps for that model expert model.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7206\"><img loading=\"lazy\" decoding=\"async\" width=\"2012\" height=\"1290\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.26.15-PM.png\" alt=\"Graphic showing the SDXL 1.0 mixture-of-experts 2-stage architecture, demonstrating how the process is broken down into denoising start and denoising end.\" class=\"wp-image-7206\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.26.15-PM.png 2012w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.26.15-PM-300x192.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.26.15-PM-1024x657.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.26.15-PM-768x492.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-4.26.15-PM-1536x985.png 1536w\" sizes=\"auto, (max-width: 2012px) 100vw, 2012px\" \/><figcaption class=\"wp-element-caption\">The <code>high_noise_frac<\/code> or <code>denoising_start<\/code> is the percentage of inference steps that the model will run through the high-noise denoising stage (i.e., the base model); graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">For example, if we were to specify 100 total inference steps with a <code>denoising_end<\/code> of 0.7, then our input would iterate 70 steps in the base model and 30 steps in the refiner model. If we were to specify a <code>denoising_end<\/code> value of 0.9, on the other hand (with the same total inference steps), our input would iterate for 90 steps in the base model and 10 in the refiner model.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7208\"><img loading=\"lazy\" decoding=\"async\" width=\"2756\" height=\"592\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac.png\" alt=\"Image showing the effect of different high noise fraction or denoising values have on the output of the SDXL 1.0 pipeline\" class=\"wp-image-7208\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac.png 2756w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac-300x64.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac-1024x220.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac-768x165.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac-1536x330.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/high_noise_frac-2048x440.png 2048w\" sizes=\"auto, (max-width: 2756px) 100vw, 2756px\" \/><figcaption class=\"wp-element-caption\">The high noise fraction, or <a href=\"https:\/\/huggingface.co\/docs\/diffusers\/main\/en\/api\/pipelines\/stable_diffusion\/stable_diffusion_xl#diffusers.StableDiffusionXLImg2ImgPipeline.__call__.denoising_start\"><code>denoising_start<\/code><\/a>, values range from 0.0 to 1.0, but usually a value between 0.7 and 0.9 is appropriate; graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">This two-stage architecture makes SDXL especially robust, without sacrificing compute resources or inference time. <\/span><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Kernel size and iterations<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">If we use masks that perfectly align with the object we are replacing, we might notice some awkward pixel transitions between the original image and where it was inpainted. By dilating the mask slightly and giving the inpainting pipeline access to background pixels near the inpainted object, the refiner can make a more seamless integration between the two.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7210\"><img loading=\"lazy\" decoding=\"async\" width=\"2328\" height=\"990\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM.png\" alt=\"Graphic showing the difference between a dilated segmentation mask and an un-dilated segmentation mask\" class=\"wp-image-7210\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM.png 2328w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM-300x128.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM-1024x435.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM-768x327.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM-1536x653.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-14-at-4.48.16-PM-2048x871.png 2048w\" sizes=\"auto, (max-width: 2328px) 100vw, 2328px\" \/><figcaption class=\"wp-element-caption\">Dilating the inpainting masks makes for a more seamless transition with the original background image; graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In order to dilate the mask, we\u2019ll need to set a kernel size and number of kernel iterations. These values may change depending on the image you are using. For more information on this process, see <\/span><a href=\"https:\/\/docs.opencv.org\/3.4\/db\/df6\/tutorial_erosion_dilatation.html\"><span style=\"font-weight: 400;\">the docs from OpenCV<\/span><\/a><span style=\"font-weight: 400;\">. We\u2019ll also define a function to simplify the dilation process:<\/span><\/p>\n\n\n\n<script src=\"https:\/\/gist.github.com\/anmorgan24\/78a6ecb828a7c6abb8e4210b0f7fdaed.js\"><\/script>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Finally, how did I choose the random seeds? Well, randomly. I used <\/span><code><span style=\"font-weight: 400;\">random.randint(-1000,1000)<\/span><\/code><span style=\"font-weight: 400;\"> and regenerated the images until I found an image I liked and wanted to work with. Then I kept tweaking hyperparameters using the same seed. That\u2019s all! <\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Nested hyperparameters<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In addition to the hyperparameters outlined above, we\u2019ll also be logging a few others, including our prompts and negative prompts. Because we\u2019ll be setting the same hyperparameters for our inpainting and outpainting pipelines, we\u2019ll define our hyperparameters in nested dictionaries. For example:<\/span><\/p>\n\n\n\n<script src=\"https:\/\/gist.github.com\/anmorgan24\/31b83a59f8f7048cfcdfc0911a0dbafc.js\"><\/script>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">These will be logged accordingly as nested hyperparameters in Comet, making them easier to access and organize, and helping to avoid duplication and confusion.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7168 size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1497\" height=\"703\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/nested_hyperparameters.gif\" alt=\"A GIF showing how nested hyperparameters are logged within the Comet ML UI\" class=\"wp-image-7168\"\/><figcaption class=\"wp-element-caption\">We use the same set of hyperparameters for our inpainting and outpainting pipelines, only with different values for each set. To avoid duplication or confusion, we log them as a nested dictionary and preserve this structure within the Comet UI.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Our SDXL 1.0 inpainting-outpainting pipeline<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">To stay consistent with the first part of this tutorial, for each of our five original input images, we generate both an inpainted, and an outpainted, image. For each of those examples, we\u2019ll generate a sample using:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">SDXL (base only)<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">SDXL (base + refiner)<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">SDXL (base + refiner + dilated masks)<\/span><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We\u2019ll then compare the results of these different methods to better understand the role of the refinement model and of dilating the segmentation masks. Once we\u2019ve selected our best outputs, we\u2019ll compare these with the best outputs from Stable Diffusion 2.0.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">After writing a <\/span><a href=\"https:\/\/colab.research.google.com\/drive\/17HTh_A-NWCVpPdxw8KJVLpgko8FZ6OQh#scrollTo=zYWY1NEleYya\"><span style=\"font-weight: 400;\">couple of functions<\/span><\/a><span style=\"font-weight: 400;\"> to abstract away some of the noise, generating these samples (and logging them to Comet) will be as simple as the following lines of code.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><script src=\"https:\/\/gist.github.com\/anmorgan24\/8660c687bcdc0e44798764f64072ae75.js\"><\/script><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">The refiner<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Even at a quick glance, it\u2019s pretty easy to see the role the refiner had in improving image quality. <\/span><span style=\"font-weight: 400;\">But the power of the SDXL refiner is most noticeable when you examine finer details like lines, textures, and faces. For this, it helps to take a closer look:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7211\"><img loading=\"lazy\" decoding=\"async\" width=\"2242\" height=\"876\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details.png\" alt=\"A graphic showing the same generated output images using only the SDXL 1.0 base model (top) and the SDXL 1.0 pipeline of base model + refiner\" class=\"wp-image-7211\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details.png 2242w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details-300x117.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details-1024x400.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details-768x300.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details-1536x600.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/base-v-refiner-details-2048x800.png 2048w\" sizes=\"auto, (max-width: 2242px) 100vw, 2242px\" \/><figcaption class=\"wp-element-caption\">The power of the SDXL refiner is most noticeable when you examine finer details like lines, textures, and faces; graphic by author.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Dilated masks<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Zooming in also helps us see the difference that dilating the masks made. The transitions between original background pixels and generated image are much smoother where masks have been dilated.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7212\"><img loading=\"lazy\" decoding=\"async\" width=\"2182\" height=\"888\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM.png\" alt=\"\" class=\"wp-image-7212\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM.png 2182w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM-300x122.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM-1024x417.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM-768x313.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM-1536x625.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-16-at-5.20.25-PM-2048x833.png 2048w\" sizes=\"auto, (max-width: 2182px) 100vw, 2182px\" \/><figcaption class=\"wp-element-caption\">By zooming in, we can also see how dilating the segmentation masks helped smooth the transitions between original background pixels and generated image; graphic by author.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">SDXL 1.0 results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">After all that hard work, how did SDXL 1.0 compare to Stable Diffusion 2.0? Let\u2019s check out the results! First, our inpainted images:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7229 size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1414\" height=\"2000\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-3.png\" alt=\"Comparing the results of Stable Diffusion 2.0 and SDXL 1.0 for image inpainting with the base + refiner models.\" class=\"wp-image-7229\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-3.png 1414w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-3-212x300.png 212w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-3-724x1024.png 724w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-3-768x1086.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-3-1086x1536.png 1086w\" sizes=\"auto, (max-width: 1414px) 100vw, 1414px\" \/><figcaption class=\"wp-element-caption\">On the left, our original input images. In the center, the results of inpainting with Stable Diffusion 2.0. On the right, the results of inpainting with SDXL 1.0. I encourage you to check out the <a href=\"https:\/\/www.comet.com\/examples\/demo-inpainting-sdxl-refiner\/view\/xjFTjQ4C3jIzmIwrhVC10vurE\/panels\">public project<\/a>, where you can zoom in and appreciate the finer differences; graphic by author.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Clearly, SDXL 1.0 is a drastic improvement to Stable Diffusion 2.0! Now let\u2019s check out our outpainting images (I encourage you to zoom in on the results to really see the finer differences too):<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7231\"><img loading=\"lazy\" decoding=\"async\" width=\"1414\" height=\"2000\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-5.png\" alt=\"Comparing the results of inpainting with Stable Diffusion 2.0 with SDXL 1.0 base model + refiner.\" class=\"wp-image-7231\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-5.png 1414w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-5-212x300.png 212w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-5-724x1024.png 724w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-5-768x1086.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Original-image-5-1086x1536.png 1086w\" sizes=\"auto, (max-width: 1414px) 100vw, 1414px\" \/><figcaption class=\"wp-element-caption\">On the left, our original input images. In the center, the results of outpainting with Stable Diffusion 2.0, and on the right the results of outpainting with SDXL 1.0. I encourage you to check out the <a href=\"https:\/\/www.comet.com\/examples\/demo-inpainting-sdxl-refiner\/view\/xjFTjQ4C3jIzmIwrhVC10vurE\/panels\">public project<\/a>, where you can zoom in and appreciate the finer differences; graphic by author.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Comparing our SDXL 1.0 results in Comet<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">As you can probably imagine, keeping track of which input images, prompts, masks, and random seeds were used to create which output images can get confusing, fast! That\u2019s why we logged all of our images to Comet as we went.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Let\u2019s head on over to the Comet UI now and take a look at each of our input images and the resulting output images after inpainting and outpainting:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7172 size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1497\" height=\"789\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/SDXL-full-UI.gif\" alt=\"Viewing our SDXL results in Comet with the Image Panel and Data Panel.\" class=\"wp-image-7172\"\/><figcaption class=\"wp-element-caption\">Viewing our results in Comet with the <a href=\"https:\/\/www.comet.com\/site\/blog\/introducing-comets-new-image-panel\/\">Image Panel<\/a> and <a href=\"https:\/\/www.comet.com\/site\/blog\/credit-card-fraud-detection-with-autoencoders\/\">Data Panel.<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We can also select individual experiments for comparison. This might be especially useful if we\u2019re trying to reproduce an image that we experimented with multiple times, or if we\u2019re trying to debug a particular experiment run.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7170\"><img loading=\"lazy\" decoding=\"async\" width=\"1497\" height=\"822\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/comparingdiffing.gif\" alt=\"Diffing our SDXL Stable Diffusion experiments in Comet\" class=\"wp-image-7170\"\/><figcaption class=\"wp-element-caption\">By <a href=\"https:\/\/www.comet.com\/docs\/v2\/guides\/comet-dashboard\/comparing-experiments\/\">diffing our experiments in Comet<\/a>, we can visualize differences in code, hyperparameters, and other metrics. Also, we can view input and output data side-by-side.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Tracking our image prompts with Comet<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">We\u2019ll also want to make sure to keep track of how we created each output so we can reproduce any of the results later on. Maybe we\u2019ve run different versions of the same prompt multiple times. Or maybe we\u2019ve tried different random seeds and want to pick our favorite result. By logging our prompts to Comet\u2019s Data Panel, we can easily retrieve all the relevant information to recreate any of our image outputs.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7173 size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1127\" height=\"620\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/logging-sdxl-prompts-to-data-panel.gif\" alt=\"A GIF showing how to log textual prompts for images in generative AI using Comet ML\" class=\"wp-image-7173\"\/><figcaption class=\"wp-element-caption\">With Comet&#8217;s Data Panels, we can filter and reorder columns, as well as sort rows by ascending or descending. Filtering can be especially helpful when looking for all prompts containing a particular keyword.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Thanks for making it all the way to the end, and I hope you found this SDXL tutorial useful! As a quick recap, in this article we:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">Learned about the SDXL 1.0 base model and refiner model and compared outputs from each;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Explored the <code>guidance_scale<\/code>, <code>num_inference_steps<\/code>, and <code>denoising_start<\/code> hyperparameters;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Compared out image outputs across hyperparameter values, as well as;&nbsp;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Logged our nested hyperparameters to Comet;<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Built an extensive dashboard in Comet to track, log, and organize our results.<\/span><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">For questions, comments, or feedback, feel free to connect with us on our <a href=\"https:\/\/cometml.slack.com\/ssb\/redirect#\/shared-invite\/email\">Community Slack Channel<\/a>. Happy coding!<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Resources<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/arxiv.org\/pdf\/2307.01952v1.pdf\"><span style=\"font-weight: 400;\">SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis<\/span><\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/docs\/diffusers\/api\/pipelines\/stable_diffusion\/stable_diffusion_xl\">Hugging Face&#8217;s Stable Diffusion Documentation<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/jalammar.github.io\/illustrated-stable-diffusion\/\"><span style=\"font-weight: 400;\">The Illustrated Stable Diffusion<\/span><\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/arxiv.org\/pdf\/2108.01073.pdf\"><span style=\"font-weight: 400;\">SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations<\/span><\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/arxiv.org\/pdf\/2205.11487.pdf\"><span style=\"font-weight: 400;\">Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding<\/span><\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/arxiv.org\/pdf\/2112.10752.pdf\"><span style=\"font-weight: 400;\">High Resolution Image Synthesis with Latent Diffusion Models&nbsp;<\/span><\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">FAQ<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Where can I find the full code for this tutorial?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Find the <\/span><a href=\"https:\/\/colab.research.google.com\/drive\/17HTh_A-NWCVpPdxw8KJVLpgko8FZ6OQh\"><span style=\"font-weight: 400;\">full code in this Colab<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where can I find the images used in this tutorial?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Download the dataset on <\/span><a href=\"https:\/\/www.kaggle.com\/datasets\/abbymorgan\/animals-toy-dataset\"><span style=\"font-weight: 400;\">Kaggle here<\/span><\/a><span style=\"font-weight: 400;\">. Download the segmentation masks and JSON metadata from <\/span><a href=\"https:\/\/www.comet.com\/examples\/artifacts\/SAM_SDXL_outputs\"><span style=\"font-weight: 400;\">Comet\u2019s public example Artifacts here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can I use SDXL 1.0 with Clipdrop?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, SDXL 1.0 is live on <\/span><a href=\"https:\/\/clipdrop.co\/stable-diffusion\"><span style=\"font-weight: 400;\">Clipdrop here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Is SDXL 1.0 open-source and open-access?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, SDXL 1.0 is open-source and open-access.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where can I access the source code and model weights used in SDXL 1.0?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The SDXL 1.0 source code and model weights are available on <\/span><a href=\"https:\/\/github.com\/Stability-AI\/generative-models\"><span style=\"font-weight: 400;\">Stability AI\u2019s Github page here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can I access SDXL 1.0 via an API?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, you can access SDXL 1.0 through the <\/span><a href=\"https:\/\/platform.stability.ai\/?_gl=1*th2p72*_ga*MjAyNjIwMzA1MC4xNjkwNDAzNTI1*_ga_W4CMY55YQZ*MTY5MTg0OTM5Mi42LjEuMTY5MTg0OTUwMC4wLjAuMA..\"><span style=\"font-weight: 400;\">API here<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What image sizes should I use with SDXL 1.0?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">According to <\/span><a href=\"https:\/\/stable-diffusion-art.com\/sdxl-model\/#What_image_sizes_should_I_use_with_SDXL_models\"><span style=\"font-weight: 400;\">Stable Diffusion Art<\/span><\/a><span style=\"font-weight: 400;\">, the recommended image sizes for the following aspect ratios are:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\"><strong>21:9<\/strong> \u2013 1536 x 640<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\"><strong>16:9<\/strong> \u2013 1344 x 768<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\"><strong>3:2<\/strong> \u2013 1216 x 832<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\"><strong>5:4<\/strong> \u2013 1152 x 896<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\"><strong>1:1<\/strong> \u2013 1024 x 1024<\/span><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Can I use SDXL 1.0 on AWS SageMaker and AWS Bedrock?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, you can use SDXL 1.0 on <\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio-jumpstart.html\"><span style=\"font-weight: 400;\">AWS SageMaker<\/span><\/a><span style=\"font-weight: 400;\"> here and on <\/span><a href=\"https:\/\/aws.amazon.com\/bedrock\/\"><span style=\"font-weight: 400;\">AWS Bedrock<\/span><\/a><span style=\"font-weight: 400;\"> here.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Is there an SDXL 1.0 Discord community?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, the <\/span><a href=\"https:\/\/discord.com\/channels\/1002292111942635562\/\"><span style=\"font-weight: 400;\">Stable Foundation Discord<\/span><\/a><span style=\"font-weight: 400;\"> is open for live testing of SDXL models.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Is SDXL 1.0 available in Dreambooth?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, <\/span><a href=\"https:\/\/dreamstudio.ai\/\"><span style=\"font-weight: 400;\">Dreamstudio<\/span><\/a><span style=\"font-weight: 400;\"> has SDXL 1.0 available for image generation.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Is SDXL 1.0 available with ControlNet?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yes, try out the <\/span><a href=\"https:\/\/huggingface.co\/diffusers\/controlnet-canny-sdxl-1.0\"><span style=\"font-weight: 400;\">HuggingFace implementation here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What license does SDXL 1.0 have?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">SDXL 1.0 is released under the <\/span><a href=\"https:\/\/github.com\/Stability-AI\/generative-models\/blob\/main\/model_licenses\/LICENSE-SDXL1.0\"><span style=\"font-weight: 400;\">CreativeML OpenRAIL++-M License<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where can I find the complete Comet project?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Explore the <\/span><a href=\"https:\/\/www.comet.com\/examples\/demo-inpainting-sdxl-refiner\/view\/xjFTjQ4C3jIzmIwrhVC10vurE\/panels\"><span style=\"font-weight: 400;\">public Comet project here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where can I create a Comet account?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Sign up for a <\/span><a href=\"\/signup\/?utm_source=SDXL_blog&amp;utm_medium=referral&amp;utm_content=Medium\"><span style=\"font-weight: 400;\">free Comet account here <\/span><\/a><span style=\"font-weight: 400;\">and start building your own projects!<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where can I ask for help with Comet?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">For questions, comments, or feedback, please join our <\/span><a href=\"https:\/\/cometml.slack.com\/join\/shared_invite\/enQtMzM0OTMwNTQ0Mjc5LWE4NzcxMzdiMmFjYzEzM2E5OTczOTk1MDZmZDg2MGJmODUwYWI0YWQ0YWMyMjlmMjQ5YmVmNzEyYjNlNzFhNjQ#\/shared-invite\/email\"><span style=\"font-weight: 400;\">Community Slack<\/span><\/a><span style=\"font-weight: 400;\"> to chat with fellow practitioners and Comet employees!<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we\u2019ll compare the results of SDXL 1.0 with its predecessor, Stable Diffusion 2.0. We\u2019ll also take a look at the role of the refiner model in the new SDXL ensemble-of-experts pipeline and compare outputs using dilated and un-dilated segmentation masks. Finally, we\u2019ll use Comet to organize all of our data and metrics. [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":7217,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[8,6,7],"tags":[40,14,29,30,15,60,51,42,36,47,61,16,53,62,63,45,46],"coauthors":[133],"class_list":["post-7147","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-machine-learning","category-tutorials","tag-comet","tag-comet-ml","tag-computer-vision","tag-deep-learning","tag-deep-learning-experiment-management","tag-diffusion-models","tag-huggingface","tag-image-inpainting","tag-image-panels","tag-image-segmentation","tag-latent-diffusion","tag-ml-experiment-management","tag-mlops","tag-sdxl-1-0","tag-stability-ai","tag-stable-diffusion","tag-text-to-image"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Image Inpainting for SDXL 1.0 Base Model + Refiner<\/title>\n<meta name=\"description\" content=\"Learn how to use SDXL 1.0 with the base + refiner model for image inpainting and compare the results to Stable Diffusion 2.0.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Image Inpainting for SDXL 1.0 Base Model + Refiner\" \/>\n<meta property=\"og:description\" content=\"Learn how to use SDXL 1.0 with the base + refiner model for image inpainting and compare the results to Stable Diffusion 2.0.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-18T19:57:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-29T15:20:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"600\" \/>\n\t<meta property=\"og:image:height\" content=\"602\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Abby Morgan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@anmorgan2414\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Abby Morgan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"21 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Image Inpainting for SDXL 1.0 Base Model + Refiner","description":"Learn how to use SDXL 1.0 with the base + refiner model for image inpainting and compare the results to Stable Diffusion 2.0.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/","og_locale":"en_US","og_type":"article","og_title":"Image Inpainting for SDXL 1.0 Base Model + Refiner","og_description":"Learn how to use SDXL 1.0 with the base + refiner model for image inpainting and compare the results to Stable Diffusion 2.0.","og_url":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-18T19:57:56+00:00","article_modified_time":"2025-04-29T15:20:13+00:00","og_image":[{"width":600,"height":602,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png","type":"image\/png"}],"author":"Abby Morgan","twitter_card":"summary_large_image","twitter_creator":"@anmorgan2414","twitter_site":"@Cometml","twitter_misc":{"Written by":"Abby Morgan","Est. reading time":"21 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/"},"author":{"name":"Abby Morgan","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/826ee39a2e30cf9d8d73155de09bb7b2"},"headline":"Image Inpainting for SDXL 1.0 Base Model + Refiner","datePublished":"2023-08-18T19:57:56+00:00","dateModified":"2025-04-29T15:20:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/"},"wordCount":3465,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png","keywords":["Comet","Comet ML","Computer Vision","Deep Learning","Deep Learning Experiment Management","Diffusion Models","HuggingFace","Image inpainting","Image Panels","image segmentation","Latent Diffusion","ML Experiment Management","MLOps","SDXL 1.0","Stability AI","Stable Diffusion","Text-to-image"],"articleSection":["Comet Community Hub","Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/","url":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/","name":"Image Inpainting for SDXL 1.0 Base Model + Refiner","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png","datePublished":"2023-08-18T19:57:56+00:00","dateModified":"2025-04-29T15:20:13+00:00","description":"Learn how to use SDXL 1.0 with the base + refiner model for image inpainting and compare the results to Stable Diffusion 2.0.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png","width":600,"height":602,"caption":"Realistic picture of an American astronaut in outerspace with planets and stars and meteorites and meteors in the background and a light halo around the astronaut. In the astronaut's helmet is the reflection of a fiery orange comet. \"SDXL\" is outlined in white in the background."},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/image-inpainting-for-sdxl-1-0-base-refiner\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Image Inpainting for SDXL 1.0 Base Model + Refiner"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/826ee39a2e30cf9d8d73155de09bb7b2","name":"Abby Morgan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/dbbf1ae921ee179c768f508340415946","url":"https:\/\/secure.gravatar.com\/avatar\/28d4934d14261b4afe12e226f0eaa57c4fb0c2761ad4586eb9a5bec3b8160bc9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28d4934d14261b4afe12e226f0eaa57c4fb0c2761ad4586eb9a5bec3b8160bc9?s=96&d=mm&r=g","caption":"Abby Morgan"},"description":"AI\/ML Growth Engineer @ Comet","sameAs":["https:\/\/www.comet.com\/","https:\/\/www.linkedin.com\/in\/anmorgan24\/","https:\/\/x.com\/anmorgan2414"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/abigailmcomet-com\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-18-at-11.00.44-PM.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7147"}],"version-history":[{"count":3,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7147\/revisions"}],"predecessor-version":[{"id":15844,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7147\/revisions\/15844"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/7217"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7147"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}