{"id":4566,"date":"2022-11-14T10:44:33","date_gmt":"2022-11-14T18:44:33","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4566"},"modified":"2025-04-24T17:16:26","modified_gmt":"2025-04-24T17:16:26","slug":"using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/","title":{"rendered":"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png\" alt=\"\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"ir is it iu iv\">\n<h2 id=\"5bb0\" class=\"kz la iy bm lb lc ld le lf lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw ga\" data-selectable-paragraph=\"\"><a href=\"https:\/\/colab.research.google.com\/gist\/mcullan\/4a93f8f32578aa0bc2d568cdd52195e4\/clip_rgb_interpolate.ipynb\">Link to Colab notebook<\/a><\/h2>\n<\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<div class=\"lx ly gt gv lz ma\">\n<div class=\"mb o fr\">\n<div class=\"mi l\">\n<div class=\"mj l mk ml mm mi mn kx ma\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2 id=\"03ce\" class=\"kz la iy bm lb lc ld le lf lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw ga\" data-selectable-paragraph=\"\"><a href=\"https:\/\/huggingface.co\/spaces\/miccull\/clip-rgb-interpolation\">Hugging Face Space<\/a><\/h2>\n<div class=\"lx ly gt gv lz ma\">\n<div class=\"mb o fr\">\n<div class=\"mi l\">\n<div class=\"mo l mk ml mm mi mn kx ma\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h1 id=\"b455\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Intro<\/h1>\n<p id=\"b025\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their&nbsp;<a class=\"au ns\" href=\"https:\/\/openai.com\/blog\/clip\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">first blog post&nbsp;<\/a>about the model in January 2021. I highly recommend that original post as an introduction to the big ideas of CLIP, if you haven\u2019t read it already\u2014but CLIP\u2019s architecture and training process are outside the scope of this tutorial.<\/p>\n<p id=\"9079\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">We\u2019re going to focus on how to use the&nbsp;<code class=\"fp ny nz oa ob b\">clip<\/code>&nbsp;Python library to make similarity comparisons between text and images with pre-trained CLIP models. This basic functionality is at the heart of popular and highly sophisticated CLIP-based techniques. Notably, the model has been very popular in the AI art \/ generative images sphere, thanks to techniques originally made popular by artist and researcher&nbsp;<a class=\"au ns\" href=\"https:\/\/twitter.com\/advadnoun\" target=\"_blank\" rel=\"noopener ugc nofollow\">Ryan Murdock<\/a>. While the official CLIP repository provides a helpful Colab tutorial for \u201cInteracting with CLIP,\u201d this tutorial will start moving us toward using CLIP as a visual reasoning engine for generative work.<\/p>\n<p id=\"4a60\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">We\u2019re going to use CLIP to compare individual solid colors to text prompts, and then see how our similarity score changes as we&nbsp;<em class=\"oc\">interpolate<\/em>&nbsp;from one color to another.<\/p>\n<p id=\"13b9\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">In the cover image for this blog post, for instance, I interpolated from a yellow (RGB=(1,1,0)) to a pink (RGB = (1,0,1)), and at each step, had CLIP report the similarity between that color and the phrase \u201cA glass of lemonade.\u201d<\/p>\n<p id=\"07c7\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">Higher scores denote greater similarity, so the bar chart tells us that the bright yellow is the best match for the text prompt. However, we see that as we fade from yellow to pink, some of the pink tones start to match \u201clemonade\u201d very well, as well. It seems that the more bluish pinks closer to the right side of the plot start to get a little less similar to our text prompt.<\/p>\n<p id=\"8018\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">We could imagine applying this idea toward optimizing a color. We provide a seed color\u2014maybe a gray or a randomly generated RGB code\u2014and we let CLIP guide us in adjusting that color to match it to a prompt as well as possible. We won\u2019t do that in this tutorial, though. You\u2019ll have to stay turned for part two of this series!<\/p>\n<\/div>\n\n\n\n<div class=\"o dx od oe id of\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<blockquote class=\"ok\"><p id=\"bfb6\" class=\"ol om iy bm on oo op oq or os ot nr cn\" data-selectable-paragraph=\"\">Prompt engineering plus Comet plus Gradio?&nbsp;<a class=\"au ns\" href=\"https:\/\/www.comet.com\/site\/clipdraw-gallery-ai-art-powered-by-comet-and-gradio\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">What comes out is amazing AI-generated art!<\/a>&nbsp;Take a closer look at our public logging project to see some of the amazing creations that have come out of this fun experiment.<\/p><\/blockquote>\n<\/div>\n\n\n\n<div class=\"o dx od oe id of\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ir is it iu iv\">\n<h1 id=\"e895\" class=\"mp la iy bm lb mq ou ms lf mt ov mv lj ke ow kf ln kh ox ki lr kk oy kl lv mz ga\" data-selectable-paragraph=\"\">Installing CLIP<\/h1>\n<p id=\"bfe8\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">Let\u2019s go over installing CLIP first. The code shown below is intended for an IPython notebook, i.e. Colab. To run it in a bash shell, just remove the&nbsp;<code class=\"fp ny nz oa ob b\">!<\/code>&nbsp;at the beginning of each line.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"293c\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">!pip install --quiet ftfy regex tqdm\n!pip install --quiet git+https:\/\/github.com\/openai\/CLIP.git<\/span><\/pre>\n<h1 id=\"61b4\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Basics of working with CLIP<\/h1>\n<p id=\"c1d2\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">First, we need to load a pre-trained CLIP model. The&nbsp;<code class=\"fp ny nz oa ob b\">clip<\/code>&nbsp;library makes a number of models available, each corresponding to a slightly different architecture. Generally, there are two categories we&#8217;re dealing with: those models based on&nbsp;<code class=\"fp ny nz oa ob b\">ResNet<\/code>&nbsp;architectures and those based on&nbsp;<code class=\"fp ny nz oa ob b\">VisualTransformer<\/code>&nbsp;architectures.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"4ddc\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\"># Should be one of ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'ViT-B\/32', 'ViT-B\/16']\nmodel_name = 'ViT-B\/16'\nmodel, preprocess = clip.load(model_name)<\/span><span id=\"2c1b\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\"># Set to \"evaluation mode\"\nmodel.eval()<\/span><\/pre>\n<h1 id=\"1a4b\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Encode some text<\/h1>\n<p id=\"75c5\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">To encode text using a pre-trained CLIP model, there are a few things we need to do. The first is to&nbsp;<a class=\"au ns\" href=\"https:\/\/heartbeat.comet.ml\/hands-on-with-hugging-faces-new-tokenizers-library-baff35d7b465\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"oc\">tokenize<\/em>&nbsp;the text<\/a>&nbsp;as follows:<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"cabb\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">text = 'some text to encode'\ntokenized_text = clip.tokenize(text)<\/span><\/pre>\n<p id=\"7bae\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">Once the text is tokenized, it can be encoded using the pre-trained model\u2019s text transformer.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"3002\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">encoded_text = model.encode_text(tokenized_text)<\/span><\/pre>\n<h1 id=\"c107\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Images in PyTorch: Dimension Order<\/h1>\n<p id=\"713d\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">PyTorch adheres to a convention that may be unfamiliar if you\u2019re used to working with images with PIL, NumPy, OpenCV, or TensorFlow. Say we have an RGB color image that is 640 pixels wide by 480 pixels tall. While PIL, NumPy, etc. would treat this as an array of shape&nbsp;<code class=\"fp ny nz oa ob b\">(480, 640, 3)<\/code>, in PyTorch it would be&nbsp;<code class=\"fp ny nz oa ob b\">(3, 480, 640)<\/code>.<\/p>\n<p id=\"8ba3\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">This means that, if you load images using PIL, for instance, you must rearrange the dimensions to the proper order. This can be accomplished using the&nbsp;<code class=\"fp ny nz oa ob b\">.permute()<\/code>&nbsp;method for torch tensors. For example:<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"31fb\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">img = Image.open('\/path\/to\/image.png')\narr = np.array(img)\n# Rearrange the dimensions from (HWC) to (CHW)\nx = torch.tensor(arr).permute((2,0,1))<\/span><\/pre>\n<p id=\"4fc2\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">But that\u2019s not the whole story! If we had a&nbsp;<em class=\"oc\">stack<\/em>&nbsp;of 10 images of this size, we would store them in a tensor of shape&nbsp;<code class=\"fp ny nz oa ob b\">(10, 3, 480, 640)<\/code>. In other words, PyTorch formats images as (N, Channel, Height, Width) or&nbsp;<code class=\"fp ny nz oa ob b\">NCHW<\/code>. Moreover, many models are designed to work with batches of images, and you may need to convert a tensor of shape&nbsp;<code class=\"fp ny nz oa ob b\">(C,H,W)<\/code>&nbsp;to&nbsp;<code class=\"fp ny nz oa ob b\">(1,C,H,W)<\/code>&nbsp;. This can be accomplished using the&nbsp;<code class=\"fp ny nz oa ob b\">.unsqueeze()<\/code>&nbsp;method as follows:<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"72c1\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\"># Old x.shape = (C, H, W), new x.shape will be (1, C, H, W)\n# because .unsqueeze is expanding dimension 0\nx = x.unsqueeze(0)<\/span><span id=\"9390\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\"># Similarly, we can do (?)\nx = x.squeeze(0)<\/span><\/pre>\n<h1 id=\"3754\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Encode an image<\/h1>\n<p id=\"6e8c\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">With CLIP, our goal is to make multi-modal comparisons \u2014 more specifically, we want to measure similarity across images and text. CLIP learns its image and text encoder models together, and we can access the image encoder via the&nbsp;<code class=\"fp ny nz oa ob b\">.encode_image<\/code>&nbsp;method of a trained CLIP model:<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"9275\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">img = Image.open('\/path\/to\/image.png')<\/span><span id=\"e5e3\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\"># We can just use the model's associated preprocesser function\nx = model.preprocess(img)<\/span><span id=\"5b8a\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">encoded_image = model.encode_image(x)<\/span><\/pre>\n<h1 id=\"5210\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Encode a single color as an image<\/h1>\n<p id=\"a5ef\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">Of course, for our example in this tutorial, we\u2019re not working with existing images. We\u2019re generating our own!<\/p>\n<p id=\"66dc\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">Let\u2019s write a function that turns a color in floating-point RGB space into a properly formatted image tensor. In other words, we may have something like&nbsp;<code class=\"fp ny nz oa ob b\">color = (1, 0, 0)<\/code>&nbsp;for red or&nbsp;<code class=\"fp ny nz oa ob b\">color = (0.5, 0.5, 0.5)<\/code>&nbsp;for a 50% gray. Really, this comes down to instantiating a tensor object and reshaping it to&nbsp;<code class=\"fp ny nz oa ob b\">NCHW<\/code>&nbsp;format. This function returns a 1&#215;1 pixel image with RGB channels. More aptly, it outputs a stack containing a single image.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"282b\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">def create_rgb_tensor(color):\n  \"\"\"color is e.g. [1,0,0]\"\"\"\n  return torch.tensor(color, device=DEVICE).reshape((1, 3, 1, 1))<\/span><span id=\"2f9a\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">red_tensor = create_rgb_tensor((1, 0, 0))<\/span><\/pre>\n<p id=\"177e\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">But we can\u2019t pass one of these into our visual encoder just yet. Recall that the CLIP model has a predefined resolution, which can be found with&nbsp;<code class=\"fp ny nz oa ob b\">model.visual.input_resolution<\/code>.<\/p>\n<p id=\"b538\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">For us, this input resolution is a height and width of&nbsp;<code class=\"fp ny nz oa ob b\">(224,224)<\/code>&nbsp;. So let&#8217;s resize our RGB image. There are any number of ways we could do this using torch and torch-compatible libraries, but I&#8217;ll use the&nbsp;<code class=\"fp ny nz oa ob b\">Resize<\/code>&nbsp;class from&nbsp;<code class=\"fp ny nz oa ob b\">torchvision.transforms<\/code>.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"c511\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">from torchvision.transforms import Resize<\/span><span id=\"c044\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\"># Create the Resizer\nresolution = model.visual.input_resolution\nresizer = torchvision.transforms.Resize(size=(resolution, resolution))<\/span><span id=\"96a6\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">resized_tensor = resizer(red_tensor)<\/span><span id=\"cf32\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">torch.encode_image(resized_tensor)<\/span><\/pre>\n<h1 id=\"eb33\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Similarity<\/h1>\n<p id=\"7b61\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">We now have encoded text and encoded images. How do we measure the similarity between them? We could employ any number of possible similarity metrics, but we\u2019ll take cosine similarity as a starting point.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"a442\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">similarity = torch.cosine_similarity(encoded_text, encoded_image)<\/span><\/pre>\n<h1 id=\"28ff\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Create and interpolate between colors<\/h1>\n<p id=\"ffc5\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">We\u2019re almost there\u2026We can encode text. We can generate images from colors. We can resize those colors to the proper size for CLIP. And we can encode those images using CLIP.<\/p>\n<p id=\"6b99\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">Now, let\u2019s write a function to&nbsp;<em class=\"oc\">interpolate<\/em>&nbsp;between two colors. We\u2019ll stick with&nbsp;<em class=\"oc\">linear interpolation,<\/em>&nbsp;which simply \u201cdraws a line\u201d between two endpoints. Note that the function below is called&nbsp;<code class=\"fp ny nz oa ob b\">lerp<\/code>&nbsp;which stands for (L)inear int(ERP)olation. This abbreviation is a strong convention, in my experience, and you&#8217;re likely to see it when and where you see a linear interpolation function defined in code.<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"c137\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">def lerp(x, y, steps=11):\n  \"\"\"Linear interpolation between two tensors \"\"\"\n\n\tweights = np.linspace(0, 1, steps)\n\tweights = torch.tensor(weights, device=DEVICE)\n\tweights = weights.reshape([-1, 1, 1, 1])\n\n\tinterpolated = x * (1 - weights) + y * weights\n  return interpolated<\/span><span id=\"7448\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">blue_tensor = create_rgb_tensor((0,0,1))\ncolor_range = lerp(red_tensor, blue_tensor, 11)<\/span><span id=\"0372\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">color_range = resizer(color_range)<\/span><span id=\"1fdc\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">similarities = torch.cosine_similarity(encoded_text, color_range)<\/span><\/pre>\n<h1 id=\"ca29\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Plotting colors in Pandas<\/h1>\n<figure class=\"ko kp kq kr gx ks gl gm paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"ce kx ky c aligncenter\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/602\/1*CwhisJWoy_WmDnmg10HYSg.png\" alt=\"\" width=\"602\" height=\"462\"><\/figure><div class=\"gl gm ph\"><picture><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*CwhisJWoy_WmDnmg10HYSg.webp 640w, https:\/\/miro.medium.com\/max\/720\/1*CwhisJWoy_WmDnmg10HYSg.webp 720w, https:\/\/miro.medium.com\/max\/750\/1*CwhisJWoy_WmDnmg10HYSg.webp 750w, https:\/\/miro.medium.com\/max\/786\/1*CwhisJWoy_WmDnmg10HYSg.webp 786w, https:\/\/miro.medium.com\/max\/828\/1*CwhisJWoy_WmDnmg10HYSg.webp 828w, https:\/\/miro.medium.com\/max\/1100\/1*CwhisJWoy_WmDnmg10HYSg.webp 1100w, https:\/\/miro.medium.com\/max\/1204\/1*CwhisJWoy_WmDnmg10HYSg.webp 1204w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 602px\"><source srcset=\"https:\/\/miro.medium.com\/max\/640\/1*CwhisJWoy_WmDnmg10HYSg.png 640w, https:\/\/miro.medium.com\/max\/720\/1*CwhisJWoy_WmDnmg10HYSg.png 720w, https:\/\/miro.medium.com\/max\/750\/1*CwhisJWoy_WmDnmg10HYSg.png 750w, https:\/\/miro.medium.com\/max\/786\/1*CwhisJWoy_WmDnmg10HYSg.png 786w, https:\/\/miro.medium.com\/max\/828\/1*CwhisJWoy_WmDnmg10HYSg.png 828w, https:\/\/miro.medium.com\/max\/1100\/1*CwhisJWoy_WmDnmg10HYSg.png 1100w, https:\/\/miro.medium.com\/max\/1204\/1*CwhisJWoy_WmDnmg10HYSg.png 1204w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 602px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"1f39\" class=\"pw-post-body-paragraph na nb iy bm b nc nt jz ne nf nu kc nh lk nv nj nk lo nw nm nn ls nx np nq nr ir ga\" data-selectable-paragraph=\"\">There are a few more functions we need for turning these colors into a nice bar plot in Pandas, but I won\u2019t describe them all in detail here. Here\u2019s a code snippet detailing this process:<\/p>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"64b5\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">def rgb2hex(rgb):\n\"\"\"Utility function for converting a floating point RGB tensor to a hexadecimal color code.\"\"\"\n    rgb = (rgb * 255).astype(int)\n    r,g,b = rgb\n    return \"#{:02x}{:02x}{:02x}\".format(r,g,b)<\/span><span id=\"10b5\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">def get_interpolated_scores(x, y, encoded_text, steps=11):\n  interpolated = lerp(x, y, steps)\n  interpolated_encodings = model.encode_image(resizer(interpolated))<\/span><span id=\"14a1\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">  scores = torch.cosine_similarity(interpolated_encodings,\nencoded_text)\n  scores = sc.detach().cpu().numpy()<\/span><span id=\"33a1\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">  rgb = interpolated.detach().cpu().numpy().reshape(-1,  3)\n  interpolated_hex = [rgb2hex(x) for x in rgb]  <\/span><span id=\"4b60\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">  data = pd.DataFrame({\n      'similarity': scores,\n      'color': interpolated_hex\n  }).reset_index().rename(columns={'index':'step'})<\/span><span id=\"35d8\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">  return data<\/span><span id=\"1a68\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">def similarity_plot(data, text_prompt):\n  title = f'CLIP Cosine Similarity Prompt=\"{text_prompt}\"'\n  fig, ax = plt.subplots()\n  plot = data['similarity'].plot(kind='bar',\n                                 ax=ax,\n                                 stacked=True,\n                                 title=title,\n                                 color=data['color'],\n                                 width=1.0,\n                                 xlim=(0, 2),\n                                 grid=False)<\/span><span id=\"4d52\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">\n  plot.get_xaxis().set_visible(False) ;\n  return fig<\/span><\/pre>\n<h1 id=\"6acf\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Deploying Our Model with Gradio and Hugging Face Spaces<\/h1>\n<p id=\"5415\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">We can get this process working in a Colab notebook, but let\u2019s talk about deploying this model as an interactive app. I\u2019ll be using&nbsp;<a class=\"au ns\" href=\"https:\/\/gradio.app\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Gradio<\/a>&nbsp;as a framework and&nbsp;<a class=\"au ns\" href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Hugging Face<\/a>&nbsp;Spaces to deploy.<\/p>\n<h1 id=\"fdc0\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Gradio code<\/h1>\n<p id=\"d2d1\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">I won\u2019t go into great detail about writing a Gradio app, but here are the basics:<\/p>\n<ol class=\"\">\n<li id=\"33da\" class=\"pi pj iy bm b nc nt nf nu lk pk lo pl ls pm nr pn po pp pq ga\" data-selectable-paragraph=\"\"><strong class=\"bm pr\">Define a function for the app to run.&nbsp;<\/strong>This is happening below with&nbsp;<code class=\"fp ny nz oa ob b\">gradio_fn<\/code>&nbsp;. Note that this is just a Python function, and doesn\u2019t actually require any Gradio specific stuff yet. This is simply what Gradio will do with the inputs we provide.<\/li>\n<li id=\"2dbc\" class=\"pi pj iy bm b nc ps nf pt lk pu lo pv ls pw nr pn po pp pq ga\" data-selectable-paragraph=\"\"><strong class=\"bm pr\">Define inputs for the Gradio interface.&nbsp;<\/strong>Gradio handles inputs using classes from the&nbsp;<code class=\"fp ny nz oa ob b\">gradio.inputs<\/code>&nbsp;module. In this example, we\u2019re using&nbsp;<code class=\"fp ny nz oa ob b\">Textbox<\/code>&nbsp;inputs to write RGB values and a&nbsp;<code class=\"fp ny nz oa ob b\">Slider<\/code>&nbsp;to select the number of steps. Gradio\u2019s simplicity is a big part of its appeal \u2014 the apps are not only simple to set up, but also have very uniform interfaces. There are other things that Gradio does an amazing job of handling for us, such as spinning up temporary, publicly shareable apps from a single line of code in a Colab notebook&nbsp;<em class=\"oc\">with no signup necessary!<\/em>&nbsp;If we wanted to create something more custom, though\u2014for instance if we wanted to use a color picker to select our RGB values\u2014we might look at a framework like&nbsp;<a class=\"au ns\" href=\"https:\/\/streamlit.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Streamlit<\/a>. Streamlit is also supported by Hugging Face Spaces, and is overall excellent, but I went with Gradio this time, especially because it\u2019s easy to test on Colab.<\/li>\n<li id=\"dc54\" class=\"pi pj iy bm b nc ps nf pt lk pu lo pv ls pw nr pn po pp pq ga\" data-selectable-paragraph=\"\"><strong class=\"bm pr\">Create and launch the Gradio interface.&nbsp;<\/strong>We have a function to run, and we have input components for our interface, so now we just need to create a&nbsp;<code class=\"fp ny nz oa ob b\">gradio.Interface<\/code>&nbsp;object, which takes as arguments our function, inputs, and an output type. Then we simply call the&nbsp;<code class=\"fp ny nz oa ob b\">.launch()<\/code>&nbsp;method! As I mentioned before, what\u2019s really amazing is that this method will work from a Colab cell, spitting out a public link and opening an inline IFrame with our app. But it will also work in our&nbsp;<code class=\"fp ny nz oa ob b\">.py<\/code>&nbsp;file on Hugging Face Spaces.<\/li>\n<\/ol>\n<pre class=\"ko kp kq kr gx oz bs pa pb dz ob\"><span id=\"0e10\" class=\"ga kz la iy ob b dm pc pd l pe pf\" data-selectable-paragraph=\"\">def gradio_fn(rgb_start, rgb_end, text_prompt, steps=11, grad_disabled=True):\n  rgb_start = [float(x.strip()) for x in rgb_start.split(',')]\n  rgb_end =  [float(x.strip()) for x in rgb_end.split(',')]<\/span><span id=\"a4ad\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">  start = create_rgb_tensor(rgb_start)\n  end = create_rgb_tensor(rgb_end)\n  encoded_text = encode_text(text_prompt)\n  data = get_interpolated_scores(start, end, encoded_text, steps)\n  return similarity_plot(data, text_prompt)<\/span><span id=\"8593\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">gradio_inputs = [gr.inputs.Textbox(lines=1, default=\"1, 0, 0\", label=\"Start RGB\"),\n                 gr.inputs.Textbox(lines=1, default=\"0, 1, 0\", label=\"End RGB\"),\n                 gr.inputs.Textbox(lines=1, label=\"Text Prompt\", default='A solid red square'),\n                 gr.inputs.Slider(minimum=1, maximum=30, step=1, default=11, label=\"Interpolation Steps\")]<\/span><span id=\"b4cb\" class=\"ga kz la iy ob b dm pg pd l pe pf\" data-selectable-paragraph=\"\">iface = gr.Interface(fn=gradio_fn, inputs=gradio_inputs, outputs=\"plot\")\niface.launch()<\/span><\/pre>\n<h1 id=\"8e2b\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Next steps<\/h1>\n<p id=\"762d\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">What else can we do now that we can compare images and text using CLIP? The list is&nbsp;<em class=\"oc\">long<\/em>, believe me, but here are some broad ideas:<\/p>\n<ol class=\"\">\n<li id=\"daf7\" class=\"pi pj iy bm b nc nt nf nu lk pk lo pl ls pm nr pn po pp pq ga\" data-selectable-paragraph=\"\">Zero-shot classification: In the vein of this project, we could imagine selecting from a predefined set of colors. For instance, selecting a&nbsp;<a class=\"au ns\" href=\"https:\/\/www.pantone-colours.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Pantone color<\/a>&nbsp;by providing a text prompt. This tutorial actually took us much of the way toward zero-shot classification. After computing the similarity score for every color in our set of interpolated colors, we would just need to report the \u201cbest match\u201d.<\/li>\n<li id=\"52ab\" class=\"pi pj iy bm b nc ps nf pt lk pu lo pv ls pw nr pn po pp pq ga\" data-selectable-paragraph=\"\">Directly modifying an image\/text\/some latent representation to maximize (or minimize) similarity. This is the basic idea driving many of the popular CLIP-enabled generative projects, such as&nbsp;<a class=\"au ns\" href=\"https:\/\/ljvmiranda921.github.io\/notebook\/2021\/08\/11\/vqgan-list\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">CLIP+VQGAN<\/a>&nbsp;(here\u2019s a great read on&nbsp;<a class=\"au ns\" href=\"https:\/\/ljvmiranda921.github.io\/notebook\/2021\/08\/08\/clip-vqgan\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">VQGAN<\/a>, by the way), CLIP Guided Diffusion, and CLIP+StyleGAN3. In fact, we can generalize here. We could take this process \u201cin reverse\u201d and use CLIP to generate captions by modifying a latent text representation to better match an image.<\/li>\n<li id=\"29c8\" class=\"pi pj iy bm b nc ps nf pt lk pu lo pv ls pw nr pn po pp pq ga\" data-selectable-paragraph=\"\">Some projects take the CLIP+Generator idea to the next level. Look at researcher Mehdi Cherti\u2019s \u201c<a class=\"au ns\" href=\"https:\/\/replicate.ai\/mehdidc\/feed_forward_vqgan_clip\" target=\"_blank\" rel=\"noopener ugc nofollow\">Feed forward VQGAN-CLIP model\u201d<\/a>, which uses CLIP and VQGAN to build a dataset of VQGAN\u2019s responses to CLIP prompts, then performs supervised learning on that prompt\/image dataset to learn to predict the latent space of VQGAN directly, bypassing the compute-heavy iterative process espoused by the standard CLIP+Generator approach.<\/li>\n<\/ol>\n<h1 id=\"e388\" class=\"mp la iy bm lb mq mr ms lf mt mu mv lj ke mw kf ln kh mx ki lr kk my kl lv mz ga\" data-selectable-paragraph=\"\">Next in this series<\/h1>\n<p id=\"e79d\" class=\"pw-post-body-paragraph na nb iy bm b nc nd jz ne nf ng kc nh lk ni nj nk lo nl nm nn ls no np nq nr ir ga\" data-selectable-paragraph=\"\">Next time, we\u2019ll look at using CLIP to guide the training of a PyTorch model to directly generate a color to best match a prompt. We\u2019ll also start comparing aspects of the different pre-trained CLIP models in terms of performance and efficiency, and we\u2019ll look at how to track the parameters and outputs of runs with&nbsp;<a class=\"au ns\" href=\"https:\/\/www.comet.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Comet<\/a>.<\/p>\n<\/div>\n\n\n\n<div class=\"o dx od oe id of\" role=\"separator\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Link to Colab notebook Hugging Face Space Intro OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their&nbsp;first blog post&nbsp;about the model in January 2021. I highly recommend that original post as an introduction to the big ideas of CLIP, if you haven\u2019t read it [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[7],"tags":[],"coauthors":[141],"class_list":["post-4566","post","type-post","status-publish","format-standard","hentry","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using CLIP and Gradio to assess similarity between text prompts and ranges of colors - Comet<\/title>\n<meta name=\"description\" content=\"OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their\u00a0first blog post\u00a0about the model in January 2021.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors\" \/>\n<meta property=\"og:description\" content=\"OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their\u00a0first blog post\u00a0about the model in January 2021.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-14T18:44:33+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png\" \/>\n<meta name=\"author\" content=\"Michael Cullan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Michael Cullan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors - Comet","description":"OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their\u00a0first blog post\u00a0about the model in January 2021.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/","og_locale":"en_US","og_type":"article","og_title":"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors","og_description":"OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their\u00a0first blog post\u00a0about the model in January 2021.","og_url":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-14T18:44:33+00:00","article_modified_time":"2025-04-24T17:16:26+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png","type":"","width":"","height":""}],"author":"Michael Cullan","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Michael Cullan","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/"},"author":{"name":"Team Comet Digital","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf"},"headline":"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors","datePublished":"2022-11-14T18:44:33+00:00","dateModified":"2025-04-24T17:16:26+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/"},"wordCount":1845,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png","articleSection":["Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/","url":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/","name":"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png","datePublished":"2022-11-14T18:44:33+00:00","dateModified":"2025-04-24T17:16:26+00:00","description":"OpenAI\u2019s CLIP model and related techniques have taken the field of machine learning by storm since the group released their\u00a0first blog post\u00a0about the model in January 2021.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#primaryimage","url":"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png","contentUrl":"https:\/\/miro.medium.com\/max\/700\/1*u5MtJ-KzvF2kXxat-Q9vZg.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/using-clip-and-gradio-to-assess-similarity-between-text-prompts-and-ranges-of-colors\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Using CLIP and Gradio to assess similarity between text prompts and ranges of colors"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/6266601170c60a7a82b3e0043fbe8ddf","name":"Team Comet Digital","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4f0c0a8cc7c0e87c636ff6a420a6647c","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-12-at-8.58.50-AM-96x96.png","caption":"Team Comet Digital"},"sameAs":["https:\/\/www.comet.ml\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/teamcometdigital\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4566","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4566"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4566\/revisions"}],"predecessor-version":[{"id":15645,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4566\/revisions\/15645"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4566"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4566"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4566"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4566"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}