{"id":7136,"date":"2023-08-14T05:17:46","date_gmt":"2023-08-14T13:17:46","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7136"},"modified":"2025-04-24T17:14:44","modified_gmt":"2025-04-24T17:14:44","slug":"sketches-in-computer-vision","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/","title":{"rendered":"Sketches in Computer Vision"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg\" alt=\"\" width=\"700\" height=\"467\"><\/figure><div class=\"mf mg mh\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Photo by <a class=\"af mz\" href=\"https:\/\/unsplash.com\/@goashape?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Goashape<\/a> on <a class=\"af mz\" href=\"https:\/\/unsplash.com\/s\/photos\/paper-drawing?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/em><\/figcaption><\/figure>\n<p id=\"0f02\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Visual understanding of the real world is the primary goal of any computer vision system, which often requires the guidance of data modalities different from photos\/videos, which can better depict the visual cues, for better understanding and interpretation.<\/p>\n<p id=\"b31a\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">In this regard, sketches provide what is perhaps the easiest mode of visually representing natural entities: all one has to do is to look at the photo (or recollect from memory) and draw a few strokes mimicking the relevant visual cues, <em class=\"nv\">not<\/em> requiring any information related to color, texture or depth.<\/p>\n<p id=\"6f37\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Despite its simplicity, a sketch can be very illustrative and possess minute, fine-grained detailings, and can even convey concepts that are hard to convey at all in words (an illustration of the same is shown in the figure below). Moreover, with the rapid proliferation of touch-screen devices, collecting simple sketches as well as the possibility of using sketches in practical settings have been greatly boosted, enabling sketches to be used for practical applications (such as sketch-based image retrieval for e-commerce applications) rather than merely being an \u201cartistic luxury.\u201d<\/p>\n<p id=\"8ef4\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">These factors have led to sketches gaining considerable attention from the machine vision community, leading to their inception and subsequent adoption in several visual understanding tasks, which has led to sketch-vision works breaking the boundaries of being a \u201cniche\u201d research domain and establishing themselves as a high-impact area of interest among the general vision community, leading up to winning <a class=\"af mz\" href=\"http:\/\/www.bmva.org\/bmvc\/2015\/awards.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">Best Science Paper Award at BMVC \u201915<\/a> and <a class=\"af mz\" href=\"https:\/\/blog.siggraph.org\/2022\/07\/siggraph-2022-technical-papers-awards-best-papers-and-honorable-mentions.html\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Best Technical Paper Award at SIGGRAPH \u201822<\/a>.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*eEcsM8QSP_ts46OKngJsfQ.png\" alt=\"\" width=\"700\" height=\"306\"><\/figure><div class=\"mf mg nw\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*eEcsM8QSP_ts46OKngJsfQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*eEcsM8QSP_ts46OKngJsfQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*eEcsM8QSP_ts46OKngJsfQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*eEcsM8QSP_ts46OKngJsfQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*eEcsM8QSP_ts46OKngJsfQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*eEcsM8QSP_ts46OKngJsfQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*eEcsM8QSP_ts46OKngJsfQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*eEcsM8QSP_ts46OKngJsfQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>The ability of sketches to convey a diverse set of concepts, emotions, descriptions etc. some of which might be even harder to express in words. Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2001.02600.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"8f4d\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">This article provides a holistic overview of sketch-based computer vision, starting from the unique characteristics of sketches that make them worthy of study, fundamental sketch representation learning approaches, followed by the application of sketches across various computer vision tasks which predominantly lie on the natural image domain. We\u2019ll discuss several interesting works at the intersection of sketches, vision and graphics, as well as outline promising future directions of research. Let\u2019s dive in!<\/p>\n<h1 id=\"4eee\" class=\"nx ny fo be nz oa ob go oc od oe gr of og oh oi oj ok ol om on oo op oq or os bj\" data-selectable-paragraph=\"\">Learning Sketch Representation<\/h1>\n<h2 id=\"7873\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Motivation: What\u2018s unique about sketches?<\/h2>\n<p id=\"2610\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">Sketches have a very unique set of characteristics that set them apart as a modality from conventional photos.<\/p>\n<ol class=\"\">\n<li id=\"42cc\" class=\"na nb fo be b gm nc nd ne gp nf ng nh pp nj nk nl pq nn no np pr nr ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Dual-modality: <\/strong>Perhaps the most important feature of sketches is that they can be expressed either as a static 2D image or as a set of sequential strokes, represented by relative planar coordinates and a pen-state that determines the start\/end of a stroke. In computer graphics, 2D sketches are commonly referred to as \u201c<em class=\"nv\">rasterized sketches,<\/em>\u201d while stroke-wise sketches are stored as \u201c<em class=\"nv\">vectorized sequences.<\/em>\u201d This dual modality existence is understandably unique to sketches, aiding both image level as well as coordinate level time-series representation learning.<\/li>\n<li id=\"d7a3\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Hierarchical structure: <\/strong>As mentioned above, vector sketches comprise coordinate-level stroke-wise information, which is essentially a time-series depiction of the drawn strokes. From a natural doodling experience (pretty sure everyone has doodled at least once!) it is very intuitive that one starts sketching from the boundary strokes, gradually moving into finer interior strokes, sketching up to varying extents of detail for each object. This <em class=\"nv\">inherently <\/em>creates a <em class=\"nv\">coarse-to-fine hierarchy of strokes<\/em>, which can be leveraged for a superior understanding of fine-grained sketches and the sketching process itself. For instance, <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2007.15103.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">this paper<\/a> [BMVC \u201920] fused sketch-specific hierarchies with those in images using a cross-modal attention scheme for fine-grained SBIR.<\/li>\n<li id=\"d87b\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Highly abstract, lacking in background: <\/strong>Sketches are highly abstract in nature, i.e. the stroke density in sketches is greater for objects in relevance\/focus rather than the background jargon. This once again follows from the intuitive sketching process- one draws in detail the foreground objects of more importance and keeps the background relatively simpler. Further, objects that are <em class=\"nv\">uniquely identified by their shape<\/em> can be sketched in a <em class=\"nv\">very abstract manner<\/em>(e.g. a pyramid expressed as a triangle). This helps in applications such as object sketching and shape retrieval, where simple doodles depicting the object\/structure can be synthesized\/used for the respective tasks.<\/li>\n<li id=\"f8a5\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Low computational storage: <\/strong>Raster sketches are typically binarized (i.e. background with foreground strokes), while vector sketches are stored as temporal coordinates, both requiring very low memory for storage.<\/li>\n<\/ol>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Iik7494gd-oFZVpE1cvn_A.png\" alt=\"\" width=\"700\" height=\"163\"><\/figure><div class=\"mf mg qb\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Iik7494gd-oFZVpE1cvn_A.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Iik7494gd-oFZVpE1cvn_A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Iik7494gd-oFZVpE1cvn_A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Iik7494gd-oFZVpE1cvn_A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Iik7494gd-oFZVpE1cvn_A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Iik7494gd-oFZVpE1cvn_A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Iik7494gd-oFZVpE1cvn_A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Iik7494gd-oFZVpE1cvn_A.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>\u201cIllustrations of the major domain-unique challenges of free-hand sketch. Each column is a photo-sketch pair. Sketch is highly abstract. A pyramid can be depicted as a triangle in sketch, and a few strokes depict a fancy handbag. Sketch is highly diverse. Different people draw distinctive sketches when given the identical reference, due to subjective salience (head vs. body), and drawing style.\u201d Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2001.02600.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<h2 id=\"2832\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Classical sketch representation learning<\/h2>\n<p id=\"0b11\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">The two most prominent conventional sketch representation learning tasks include <strong class=\"be pv\">(i) sketch recognition<\/strong> and <strong class=\"be pv\">(ii) sketch generation<\/strong>. It is noteworthy that while recognition is primarily done on raster (or offline) sketches, sketch generation models work on vector (or online) sketches i.e. learning coordinate-wise time-series representations.<\/p>\n<p id=\"8c2b\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Sketch recognition involves the most fundamental task of computer vision \u2014 predicting the class label of a given sketch. One of the first works leveraging deep learning includes <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1501.07873.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch-a-Net<\/a> [BMVC \u201815], which presented a multi-scale CNN architecture designed <em class=\"nv\">specifically <\/em>for sketches, especially at different levels of abstraction. Their architecture is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*IkX8ndVLKjV9Ska0mFrivQ.png\" alt=\"\" width=\"700\" height=\"317\"><\/figure><div class=\"mf mg qc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*IkX8ndVLKjV9Ska0mFrivQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*IkX8ndVLKjV9Ska0mFrivQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*IkX8ndVLKjV9Ska0mFrivQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*IkX8ndVLKjV9Ska0mFrivQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*IkX8ndVLKjV9Ska0mFrivQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*IkX8ndVLKjV9Ska0mFrivQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*IkX8ndVLKjV9Ska0mFrivQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*IkX8ndVLKjV9Ska0mFrivQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1501.07873.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"01ae\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Among several other works, <a class=\"af mz\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/8803426\" target=\"_blank\" rel=\"noopener ugc nofollow\">Deep-SSZSL<\/a> [ICIP \u201819] attempted a scene sketch recognition under a <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/Zero-shot_learning\" target=\"_blank\" rel=\"noopener ugc nofollow\">zero-shot learning<\/a> setup. More recently, Xu et al<em class=\"nv\">. <\/em>proposed <a class=\"af mz\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9397867\" target=\"_blank\" rel=\"noopener ugc nofollow\">MGT<\/a> which sought to model sketches as sparsely connected graphs, leveraging a graph transformer network for geometric representation learning of sketches, which could model spatial semantics as well as stroke-level temporal information.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*NSTAmJ9Zqerm36RV-rUlLg.png\" alt=\"\" width=\"700\" height=\"294\"><\/figure><div class=\"mf mg qd\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*NSTAmJ9Zqerm36RV-rUlLg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*NSTAmJ9Zqerm36RV-rUlLg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*NSTAmJ9Zqerm36RV-rUlLg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*NSTAmJ9Zqerm36RV-rUlLg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*NSTAmJ9Zqerm36RV-rUlLg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*NSTAmJ9Zqerm36RV-rUlLg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*NSTAmJ9Zqerm36RV-rUlLg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*NSTAmJ9Zqerm36RV-rUlLg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1912.11258.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"60a8\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">For an extensive review of sketch recognition literature, readers are advised to read this <a class=\"af mz\" href=\"https:\/\/doi.org\/10.1016\/j.imavis.2019.06.010\" target=\"_blank\" rel=\"noopener ugc nofollow\">survey paper<\/a>.<\/p>\n<p id=\"5c9f\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Coming to sketch generation, the first work in this domain was the seminal <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1704.03477.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch-RNN<\/a> [ICLR \u201818] which proposed a sequence-to-sequence <a class=\"af mz\" href=\"https:\/\/towardsdatascience.com\/understanding-variational-autoencoders-vaes-f70510919f73\" target=\"_blank\" rel=\"noopener\">Variational Autoencoder<\/a> (VAE) model, with a bi-directional RNN encoder that takes in a sketch sequence and its reverse as the bi-directional inputs, the decoder being an <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/Autoregressive_model\" target=\"_blank\" rel=\"noopener ugc nofollow\">autoregressive<\/a> RNN that samples coordinates from a bivariate <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/Mixture_model\" target=\"_blank\" rel=\"noopener ugc nofollow\">Gaussian mixture model<\/a> to reconstruct the sketch sequence from the latent vector obtained at the bottleneck of the VAE. The architecture of Sketch-RNN is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*lRFiP6o7-UsyDtdeR-kNLg.png\" alt=\"\" width=\"700\" height=\"263\"><\/figure><div class=\"mf mg qe\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*lRFiP6o7-UsyDtdeR-kNLg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*lRFiP6o7-UsyDtdeR-kNLg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*lRFiP6o7-UsyDtdeR-kNLg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*lRFiP6o7-UsyDtdeR-kNLg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*lRFiP6o7-UsyDtdeR-kNLg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*lRFiP6o7-UsyDtdeR-kNLg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*lRFiP6o7-UsyDtdeR-kNLg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*lRFiP6o7-UsyDtdeR-kNLg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1704.03477.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"7fe8\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">A very recent work, <a class=\"af mz\" href=\"https:\/\/openreview.net\/pdf?id=c-4HSDAWua5\" target=\"_blank\" rel=\"noopener ugc nofollow\">SketchODE<\/a> [ICLR \u201922] for the first time introduced the theory of <a class=\"af mz\" href=\"https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/69386f6bb1dfed68692a24c8686939b9-Paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">neural ordinary differential equations<\/a> for representation learning of vector sketches in continuous time.<\/p>\n<p id=\"1817\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Recently, there have been a few attempts at introducing <a class=\"af mz\" href=\"https:\/\/ai.facebook.com\/blog\/self-supervised-learning-the-dark-matter-of-intelligence\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">self-supervised learning (SSL)<\/a> to sketch representation learning, so as to reduce the annotation requirements. <a class=\"af mz\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9119480\" target=\"_blank\" rel=\"noopener ugc nofollow\">This paper<\/a> proposed a hybrid of existing SSL pretext tasks such as rotation and deformation prediction for learning from unlabelled sketches. More recently, Bhunia et al.\u2019s <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/papers\/Bhunia_Vectorization_and_Rasterization_Self-Supervised_Learning_for_Sketch_and_Handwriting_CVPR_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Vector2Raster<\/a> [CVPR \u201821] proposed cross-modal translation between the vector (temporal) and raster (spatial) forms of sketches as a pretext task, so that it can exploit both spatial and sequential attributes of sketches.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*BorljyY13EqbxgpnMDzlSA.png\" alt=\"\" width=\"700\" height=\"234\"><\/figure><div class=\"mf mg qf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*BorljyY13EqbxgpnMDzlSA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*BorljyY13EqbxgpnMDzlSA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*BorljyY13EqbxgpnMDzlSA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*BorljyY13EqbxgpnMDzlSA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*BorljyY13EqbxgpnMDzlSA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*BorljyY13EqbxgpnMDzlSA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*BorljyY13EqbxgpnMDzlSA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*BorljyY13EqbxgpnMDzlSA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/papers\/Bhunia_Vectorization_and_Rasterization_Self-Supervised_Learning_for_Sketch_and_Handwriting_CVPR_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<h1 id=\"c45c\" class=\"nx ny fo be nz oa ob go oc od oe gr of og oh oi oj ok ol om on oo op oq or os bj\" data-selectable-paragraph=\"\">Leveraging sketches for different vision tasks<\/h1>\n<p id=\"6aea\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">In this section, we explore a variety of computer vision tasks involving sketches. Since the domain gap between sparse sketches and other forms of multimedia (RGB photos\/videos\/3D shapes) is considerably high, each of these tasks typically involves cross-modal representation learning at its core.<\/p>\n<h2 id=\"328d\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Sketch-based Image Retrieval (SBIR)<\/h2>\n<p id=\"8349\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">This is the most popular and well-explored sketch-vision research area, where sketches serve as a query medium to retrieve photos from a gallery. The primary motivation for SBIR is that it is a lot easier to express shape, pose and style features by drawing sketches rather than writing a query text. Moreover, words often fail to adequately describe the <em class=\"nv\">exact<\/em> search query, especially at instance level, where the entire gallery constitutes the same category and one needs to retrieve a particular distinct object only.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*PDPkl6knpHy10hiqD1M9tA.png\" alt=\"\" width=\"700\" height=\"188\"><\/figure><div class=\"mf mg qg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*PDPkl6knpHy10hiqD1M9tA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*PDPkl6knpHy10hiqD1M9tA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*PDPkl6knpHy10hiqD1M9tA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*PDPkl6knpHy10hiqD1M9tA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*PDPkl6knpHy10hiqD1M9tA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*PDPkl6knpHy10hiqD1M9tA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*PDPkl6knpHy10hiqD1M9tA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*PDPkl6knpHy10hiqD1M9tA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>A high-level illustration of a sketch-based image retrieval objective. Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2001.02600.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"ff78\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">One of the earliest works on deep SBIR was <a class=\"af mz\" href=\"https:\/\/qugank.github.io\/papers\/ICIP16.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">this paper<\/a> [ICIP \u201816], where the authors introduced <a class=\"af mz\" href=\"https:\/\/towardsdatascience.com\/siamese-networks-introduction-and-implementation-2140e3443dee\" target=\"_blank\" rel=\"noopener\">Siamese networks<\/a> for learning correspondences between a sketch and the edge map generated from a photo. The reason behind using edge maps is that they somewhat resemble sketches in terms of depicting the outline (and thereby the shape) of an object, whereas RGB photos are very different from sketches. However, more recent works directly learn a sketch-photo joint embedding space, such that all photos resembling a sketch are ideally clustered around that sketch embedding.<\/p>\n<p id=\"01da\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">With the ability of sketches to provide visually fine-grained details, the focus on the retrieval task gradually shifted from mere categorical to instance-level \u2014 where all objects belong to the same class but are unique entities having subtle differences among them. This is a practical scenario for commercial applications, where one intends to retrieve a <em class=\"nv\">specific instance<\/em> of a given category (say, a specific styled shoe out of a gallery of shoe photos).<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*hPZSMkKJ7ntGIEfumuvcqA.png\" alt=\"\" width=\"700\" height=\"256\"><\/figure><div class=\"mf mg qh\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*hPZSMkKJ7ntGIEfumuvcqA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*hPZSMkKJ7ntGIEfumuvcqA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*hPZSMkKJ7ntGIEfumuvcqA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*hPZSMkKJ7ntGIEfumuvcqA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*hPZSMkKJ7ntGIEfumuvcqA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*hPZSMkKJ7ntGIEfumuvcqA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*hPZSMkKJ7ntGIEfumuvcqA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*hPZSMkKJ7ntGIEfumuvcqA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2001.02600.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"f8ee\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_ICCV_2017\/papers\/Song_Deep_Spatial-Semantic_Attention_ICCV_2017_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">This paper<\/a> by Song et al. leveraged a spatial attention mechanism that enables the CNN model to focus on fine-grained regions rather than the entire image, along with a skip connection block from the input itself to ensure the feature representations are context-aware and do not get misaligned due to imprecise attention masks. Their model is shown in the figure below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*o-PR9L9AQXMSDlI66zoZFw.png\" alt=\"\" width=\"700\" height=\"436\"><\/figure><div class=\"mf mg qi\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*o-PR9L9AQXMSDlI66zoZFw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*o-PR9L9AQXMSDlI66zoZFw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*o-PR9L9AQXMSDlI66zoZFw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*o-PR9L9AQXMSDlI66zoZFw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*o-PR9L9AQXMSDlI66zoZFw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*o-PR9L9AQXMSDlI66zoZFw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*o-PR9L9AQXMSDlI66zoZFw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*o-PR9L9AQXMSDlI66zoZFw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_ICCV_2017\/papers\/Song_Deep_Spatial-Semantic_Attention_ICCV_2017_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"14b5\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"http:\/\/www.bmva.org\/bmvc\/2017\/papers\/paper046\/paper046.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">This paper<\/a> proposed a hybrid cross-domain discriminative-generative learning framework that performs sketch generation from the paired photo as well as the anchor sketch, together with category-level classification loss and metric learning objective for fine-grained SBIR. The authors hypothesize that adding a reconstruction module aids the learning of <em class=\"nv\">semantically consistent cross-domain visual representations<\/em>, which otherwise become domain-specific in naive FG-SBIR models. Their architecture is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-8yZ4fJERQhx6wmZy-GGkQ.png\" alt=\"\" width=\"700\" height=\"371\"><\/figure><div class=\"mf mg qj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*-8yZ4fJERQhx6wmZy-GGkQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"http:\/\/www.bmva.org\/bmvc\/2017\/papers\/paper046\/paper046.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"a992\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Although sketches have shown great promise as a query medium for content retrieval, there are some inherent challenges they bring to the table, which we shall discuss next:<\/p>\n<p id=\"b7bf\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Style diversity among users drawing sketches:<\/strong> It is practical that sketches drawn by different users will vary considerably in terms of style, due to differences in subjective interpretation and sketching expertise. Since all such sketches ideally represent the same photo, it is essential to design a framework that is invariant with style variations among sketches. Sain et al.\u2019s <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/papers\/Sain_StyleMeUp_Towards_Style-Agnostic_Sketch-Based_Image_Retrieval_CVPR_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">StyleMeUp<\/a> [CVPR \u201821] sought to tackle this challenge by devising a framework that disentangles a sketch into a style part, which is unique to the sketches, and a content part that semantically resembles the photo it is paired with. The authors leverage meta-learning to make their model dynamically adaptable to newer user styles, thus making it style-agnostic.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Ilr0BDhYxx5sG4IVYgU59g.png\" alt=\"\" width=\"700\" height=\"493\"><\/figure><div class=\"mf mg qk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Ilr0BDhYxx5sG4IVYgU59g.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Ilr0BDhYxx5sG4IVYgU59g.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Ilr0BDhYxx5sG4IVYgU59g.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Ilr0BDhYxx5sG4IVYgU59g.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Ilr0BDhYxx5sG4IVYgU59g.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Ilr0BDhYxx5sG4IVYgU59g.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Ilr0BDhYxx5sG4IVYgU59g.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Ilr0BDhYxx5sG4IVYgU59g.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Figure depicting style variations among sketches drawn for the same object. Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/papers\/Sain_StyleMeUp_Towards_Style-Agnostic_Sketch-Based_Image_Retrieval_CVPR_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"e0a0\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Noisy strokes in sketches:<\/strong> It is a common feeling among people, especially those who do not have an artistic background, that they cannot sketch properly. This often leads to users drawing irrelevant and often noisy strokes, which they mistakenly feel \u201cwould yield a better sketch.\u201d It is intuitive that this problem is more prominent in fine-grained sketches where the system itself demands minute detailings, and thus users are prone to go astray with their strokes, thereby hurting retrieval performance.<\/p>\n<p id=\"7cd9\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">A very recent work, <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2022\/papers\/Bhunia_Sketching_Without_Worrying_Noise-Tolerant_Sketch-Based_Image_Retrieval_CVPR_2022_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">NoiseTolerant-SBIR<\/a> [CVPR \u201822] proposed a way to alleviate this issue by a <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\" target=\"_blank\" rel=\"noopener ugc nofollow\">reinforcement learning<\/a> (RL)-based stroke subset selector that detects noisy strokes by means of quantifying the importance of each stroke constituting the sketch with respect to the retrieval performance. Retrieval results are shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*LdSgcmf7GQF643_L3niSeg.png\" alt=\"\" width=\"700\" height=\"133\"><\/figure><div class=\"mf mg ql\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*LdSgcmf7GQF643_L3niSeg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*LdSgcmf7GQF643_L3niSeg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*LdSgcmf7GQF643_L3niSeg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*LdSgcmf7GQF643_L3niSeg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*LdSgcmf7GQF643_L3niSeg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*LdSgcmf7GQF643_L3niSeg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*LdSgcmf7GQF643_L3niSeg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*LdSgcmf7GQF643_L3niSeg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2022\/papers\/Bhunia_Sketching_Without_Worrying_Noise-Tolerant_Sketch-Based_Image_Retrieval_CVPR_2022_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"c8fd\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Another recent work, <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_CVPR_2020\/papers\/Bhunia_Sketch_Less_for_More_On-the-Fly_Fine-Grained_Sketch-Based_Image_Retrieval_CVPR_2020_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">OnTheFly-FGSBIR<\/a> [CVPR \u201820] brought forward a new paradigm \u2014 to retrieve photos on-the-fly as soon as the user begins sketching. The authors proposed an RL-based cross-modal retrieval framework that can handle partial sketches for the retrieval task, along with a novel RL reward scheme that takes care of noisy strokes drawn by the user at any given instant. Their architecture is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*R4BGB69CXaPiPyw5XlrAFQ.png\" alt=\"\" width=\"700\" height=\"263\"><\/figure><div class=\"mf mg qm\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*R4BGB69CXaPiPyw5XlrAFQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*R4BGB69CXaPiPyw5XlrAFQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*R4BGB69CXaPiPyw5XlrAFQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*R4BGB69CXaPiPyw5XlrAFQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*R4BGB69CXaPiPyw5XlrAFQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*R4BGB69CXaPiPyw5XlrAFQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*R4BGB69CXaPiPyw5XlrAFQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*R4BGB69CXaPiPyw5XlrAFQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2002.10310.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"2c63\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Deep learning models in almost every domain face the bottleneck of data scarcity, and SBIR is no exception. Further, it is noteworthy that collecting fine-grained sketches that pair with photos demand skilled sketchers, which is costly and time-consuming.<\/p>\n<p id=\"2d1f\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Bhunia <em class=\"nv\">et al.<\/em>\u2019s <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/papers\/Bhunia_More_Photos_Are_All_You_Need_Semi-Supervised_Learning_for_Fine-Grained_CVPR_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">SemiSup-FGSBIR<\/a> [CVPR \u201921] was one of the first works that aimed at tackling this bottleneck by proposing a pipeline that generates sketches from unlabelled photos and then leverages them for SBIR training, with a conjugate training paradigm such that learning of one model benefits the other. More recently, <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2105.08237v2.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">this work<\/a> presented a fully unsupervised SBIR framework that leverages <a class=\"af mz\" href=\"https:\/\/towardsdatascience.com\/optimal-transport-a-hidden-gem-that-empowers-todays-machine-learning-2609bbf67e59\" target=\"_blank\" rel=\"noopener\">optimal transport<\/a> for sketch-photo domain alignment, along with unsupervised representation learning for individual modalities.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png\" alt=\"\" width=\"700\" height=\"271\"><\/figure><div class=\"mf mg qn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Ot4Dvpy3fm0ZOqG4I_zcQg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Sources: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/papers\/Bhunia_More_Photos_Are_All_You_Need_Semi-Supervised_Learning_for_Fine-Grained_CVPR_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">(a)<\/a>, <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2105.08237v2.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">(b)<\/a><\/em><\/figcaption>\n<\/figure>\n<h2 id=\"400a\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Sketch-Photo Generation<\/h2>\n<p id=\"267a\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">Sketch and photo mutual generation has been an active line of sketch research, the underlying task being to learn effective cross-modal translation between two very different modalities \u2014 sketches and photos. Generating sketches from photos would immediately remind one of <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/Edge_detection\" target=\"_blank\" rel=\"noopener ugc nofollow\">edge detector<\/a> filters and thus might seem straightforward, but it is important to note that <em class=\"nv\">free-hand sketches are very different from edge\/contour maps<\/em>, primarily due to the two previously mentioned characteristics of sketches \u2014 abstract nature and lacking in background.<\/p>\n<p id=\"1f28\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Song_Learning_to_Sketch_CVPR_2018_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">This paper<\/a> [CVPR \u201818] improved upon the previously discussed <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1704.03477.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch-RNN<\/a>to propose a photo-to-sketch generative model, where a CNN encoder encodes the input photo, which is decoded into a sequential sketch by the Sketch-RNN decoder. For this, they devised four encoder-decoder sub-models \u2014 two for supervised cross-modal translation (i.e. sketch-to-photo and vice-versa) and two for unsupervised intra-modal reconstruction. The authors argue that mere photo-sketch pairing provides noisy and weak supervision due to the large domain gap, which is improved by introducing within-domain reconstruction.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*utM1JjJNqru0b6Xv1LfSMQ.png\" alt=\"\" width=\"700\" height=\"516\"><\/figure><div class=\"mf mg qo\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*utM1JjJNqru0b6Xv1LfSMQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*utM1JjJNqru0b6Xv1LfSMQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*utM1JjJNqru0b6Xv1LfSMQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*utM1JjJNqru0b6Xv1LfSMQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*utM1JjJNqru0b6Xv1LfSMQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*utM1JjJNqru0b6Xv1LfSMQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*utM1JjJNqru0b6Xv1LfSMQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*utM1JjJNqru0b6Xv1LfSMQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Song_Learning_to_Sketch_CVPR_2018_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"6616\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Coming to the reverse process, synthesizing photos from free-hand sketches is a much more challenging task, primarily due to the abstract nature of sketches which do not have much of a background context, something essential to realistic photos. Furthermore, that a sketch can be colorized in several non-trivial color combinations to yield an image makes the synthesis process highly challenging, and often prone to unrealistic results.<\/p>\n<p id=\"e0f9\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Chen et al.\u2019s <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Chen_SketchyGAN_Towards_Diverse_CVPR_2018_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">SketchyGAN<\/a> [CVPR \u201818] leveraged edge maps along with sketch-photo pairs for training a conditional <a class=\"af mz\" href=\"https:\/\/heartbeat.comet.ml\/a-guide-to-generative-adversarial-networks-gans-2d89e03d4806\" target=\"_blank\" rel=\"noopener ugc nofollow\">GAN<\/a> model. The reason for using edge maps was to incorporate context supervision for the translation task, which is absent in sketches. However, since the ultimate objective was to generate photos from free-hand sketches only, the authors suitably processed the edge maps to make them resemble sketches. Some outputs obtained by SketchyGAN are shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*aq6pNxwE_-n3hUrJpt8QvQ.png\" alt=\"\" width=\"700\" height=\"590\"><\/figure><div class=\"mf mg qp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*aq6pNxwE_-n3hUrJpt8QvQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Chen_SketchyGAN_Towards_Diverse_CVPR_2018_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"d6c8\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">However, despite SketchyGAN obtaining promising results, one limitation the authors pointed out was that their generated images were <em class=\"nv\">not visually realistic enough<\/em> to match real-world photos.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<blockquote class=\"qy\"><p id=\"cc98\" class=\"qz ra fo be rb rc rd re rf rg rh nu dv\" data-selectable-paragraph=\"\">Prompt engineering plus Comet plus Gradio? What comes out is amazing AI-generated art! T<a class=\"af mz\" href=\"https:\/\/www.comet.com\/site\/blog\/clipdraw-gallery-ai-art-powered-by-comet-and-gradio\/?utm_source=heartbeat&amp;utm_medium=referral&amp;utm_campaign=AMS_US_EN_AWA_heartbeat_CTA\" target=\"_blank\" rel=\"noopener ugc nofollow\">ake a closer look at our public logging project<\/a> to see some of the amazing creations that have come out of this fun experiment.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"befe\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Among other approaches for sketch-to-photo synthesis, <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1909.08313v3.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsupervised-S2P<\/a>[ECCV \u201820] involves a two-stage image-to-image translation, namely first converting a sketch image to greyscale, followed by colorization of the latter, which is boosted by self-supervised denoising and an attention module, which aids the model to generate images that resemble the original sketches and are also photo-realistic.<\/p>\n<p id=\"b2e9\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Another work, <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2012.09290.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Self-Supervised-S2I<\/a> [AAAI \u201821] proposes an autoencoder-based style decoupling the content features of sketches and style features of images, followed by synthesis of photos that bear content resemblance with sketches and conform to the style of the original RGB images. To alleviate expensive sketch-photo pairing, the authors generate synthetic sketches from RGB-image datasets in an unsupervised manner. Further, they apply adversarial loss terms to ensure high-quality image generation at higher resolutions. Their model has been shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*KyytV-NNWJPem7J6HNG-BQ.png\" alt=\"\" width=\"700\" height=\"255\"><\/figure><div class=\"mf mg ri\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*KyytV-NNWJPem7J6HNG-BQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*KyytV-NNWJPem7J6HNG-BQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*KyytV-NNWJPem7J6HNG-BQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*KyytV-NNWJPem7J6HNG-BQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*KyytV-NNWJPem7J6HNG-BQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*KyytV-NNWJPem7J6HNG-BQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*KyytV-NNWJPem7J6HNG-BQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*KyytV-NNWJPem7J6HNG-BQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2012.09290.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"e808\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">A recent paper <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/ICCV2021\/papers\/Wang_Sketch_Your_Own_GAN_ICCV_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch Your Own GAN<\/a> [ICCV \u201821] came up with a very interesting paradigm \u2014 to customize the outputs of a pre-trained GAN based on a few sketch inputs. Essentially, they used a cross-domain adversarial loss to match the model outputs with the user sketches, along with regularization losses to preserve the background cues of the original image. During few-shot adaptation with input sketches, the original weights of the GAN model weights change in order to generate images bearing the style of the sketches passed through it. The framework has been shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*ww2X32LhKcCe8SkuMlJa_Q.png\" alt=\"\" width=\"700\" height=\"438\"><\/figure><div class=\"mf mg rj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*ww2X32LhKcCe8SkuMlJa_Q.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*ww2X32LhKcCe8SkuMlJa_Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*ww2X32LhKcCe8SkuMlJa_Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*ww2X32LhKcCe8SkuMlJa_Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*ww2X32LhKcCe8SkuMlJa_Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*ww2X32LhKcCe8SkuMlJa_Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*ww2X32LhKcCe8SkuMlJa_Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*ww2X32LhKcCe8SkuMlJa_Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/ICCV2021\/papers\/Wang_Sketch_Your_Own_GAN_ICCV_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"fa19\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Some of their outputs have been shown in the figure below. As can be seen, sketches render a distinctive style according to which the GAN outputs are modulated. Another noteworthy observation is that only the image style changes, while the background context and other visual cues (texture, colour etc) are preserved. This can be attributed to the background-lacking nature of sketches, enabling sole focus on the style adaptation of the model on the object(s) in context.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*lYJ-Wx6cokodiBXP67c40g.png\" alt=\"\" width=\"700\" height=\"435\"><\/figure><div class=\"mf mg rk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*lYJ-Wx6cokodiBXP67c40g.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*lYJ-Wx6cokodiBXP67c40g.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*lYJ-Wx6cokodiBXP67c40g.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*lYJ-Wx6cokodiBXP67c40g.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*lYJ-Wx6cokodiBXP67c40g.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*lYJ-Wx6cokodiBXP67c40g.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*lYJ-Wx6cokodiBXP67c40g.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*lYJ-Wx6cokodiBXP67c40g.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/openaccess.thecvf.com\/content\/ICCV2021\/papers\/Wang_Sketch_Your_Own_GAN_ICCV_2021_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<h2 id=\"7744\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Sketch-guided Image Editing<\/h2>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*0boAcyZlMWE9vigTi_7sog.png\" alt=\"\" width=\"700\" height=\"250\"><\/figure><div class=\"mf mg rl\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*0boAcyZlMWE9vigTi_7sog.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*0boAcyZlMWE9vigTi_7sog.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*0boAcyZlMWE9vigTi_7sog.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*0boAcyZlMWE9vigTi_7sog.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*0boAcyZlMWE9vigTi_7sog.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*0boAcyZlMWE9vigTi_7sog.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*0boAcyZlMWE9vigTi_7sog.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*0boAcyZlMWE9vigTi_7sog.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>High-level process of interactive sketch-based image editing. Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2111.15078.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"b727\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Sketch-photo joint learning has also led to sketch-based image manipulation, an interactive task where the user draws contour curves (i.e. sketches) over an image to mark the specific changes they intend to incorporate in the photo, and the model adapts itself to reflect the drawn edits in its output.<\/p>\n<p id=\"5c50\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Traditionally speaking, such frameworks would take the drawn contour as well as a mask from the user to specifically indicate the modification region, after which the masked regions would be deemed as \u201ccavities\u201d in an image which needed to be filled in based on the input sketch conditioning. For this cavity filling, they would combine principles of <a class=\"af mz\" href=\"https:\/\/medium.com\/@kaantopcu\/neural-style-transfer-creating-art-with-deep-learning-9a7a5911dece\" rel=\"noopener\">style transfer<\/a>, <a class=\"af mz\" href=\"https:\/\/wandb.ai\/site\/articles\/introduction-to-image-inpainting-with-deep-learning\" target=\"_blank\" rel=\"noopener ugc nofollow\">image inpainting<\/a> and <a class=\"af mz\" href=\"https:\/\/arxiv.org\/abs\/2101.08629\" target=\"_blank\" rel=\"noopener ugc nofollow\">image-to-image translation<\/a>, among others. An illustration of this process is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*qrL_xwq3JO66tCetoE9h2w.png\" alt=\"\" width=\"700\" height=\"197\"><\/figure><div class=\"mf mg rm\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*qrL_xwq3JO66tCetoE9h2w.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*qrL_xwq3JO66tCetoE9h2w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*qrL_xwq3JO66tCetoE9h2w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*qrL_xwq3JO66tCetoE9h2w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*qrL_xwq3JO66tCetoE9h2w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*qrL_xwq3JO66tCetoE9h2w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*qrL_xwq3JO66tCetoE9h2w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*qrL_xwq3JO66tCetoE9h2w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Inpainting-based image editing pipeline. Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2111.15078.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"175c\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">However, a major drawback of this framework is that drawing a sketch as well as a mask region is tedious and often redundant too. Moreover, an inpainting pipeline requires dropped-off pixels, which discards context information that can lead to noisy outputs. The paper <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2111.15078.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">SketchEdit<\/a> [CVPR \u201822] sought to address this by proposing a model that can work solely on sketch inputs, removing the need to draw masks as well. The authors used a mask estimator network to predict the modifiable region from the full image based on the drawn sketch, followed by generating new pixels within that region using a generator and blending it with the original image via a style encoder. Their architecture is shown in the figure below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*cUMLaa2z1vba7ckjgFbhBw.png\" alt=\"\" width=\"700\" height=\"213\"><\/figure><div class=\"mf mg rn\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*cUMLaa2z1vba7ckjgFbhBw.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*cUMLaa2z1vba7ckjgFbhBw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*cUMLaa2z1vba7ckjgFbhBw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*cUMLaa2z1vba7ckjgFbhBw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*cUMLaa2z1vba7ckjgFbhBw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*cUMLaa2z1vba7ckjgFbhBw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*cUMLaa2z1vba7ckjgFbhBw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*cUMLaa2z1vba7ckjgFbhBw.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2111.15078.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<h2 id=\"27b8\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Sketches for 3D Vision<\/h2>\n<p id=\"bfbf\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">Free-hand sketches have been particularly useful for 3D shape modelling and retrieval, thanks to their cheap availability as well as being a simple yet effective way of representing shapes.<\/p>\n<p id=\"b99b\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Shen et al.\u2019s <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1908.07198.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">DeepSketchHair<\/a> [TVCG \u201820] presented a GAN-based strand-level 3D hairstyle modeling framework from 2D hair strokes. The pipeline starts with a sketched hair contour along with rough strokes depicting hair growing direction, from which a mask is obtained that is translated into a dense 2D orientation field, which is then converted into a 3D vector field. To enable multi-view hair modeling, a voxel-to-voxel conversion network is used that updates the 3D vector field based on user edits under a novel view. Their overall architecture is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*fjLbxbTnjb3c6oceNgZ5EA.png\" alt=\"\" width=\"700\" height=\"304\"><\/figure><div class=\"mf mg ro\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*fjLbxbTnjb3c6oceNgZ5EA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*fjLbxbTnjb3c6oceNgZ5EA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*fjLbxbTnjb3c6oceNgZ5EA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*fjLbxbTnjb3c6oceNgZ5EA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*fjLbxbTnjb3c6oceNgZ5EA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*fjLbxbTnjb3c6oceNgZ5EA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*fjLbxbTnjb3c6oceNgZ5EA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*fjLbxbTnjb3c6oceNgZ5EA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1908.07198.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"b98b\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Among more recent works, <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2011.06133.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">this paper<\/a> [3DV \u201820] discussed various challenges for sketch-based modelling along with a comprehensive evaluation of solutions to tackle the same, while <a class=\"af mz\" href=\"http:\/\/www-ens.iro.umontreal.ca\/~brodtkir\/projects\/sketch2pose\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch2Pose<\/a> [SIGGRAPH \u201822] presented a framework to estimate 3D character pose from a single bitmap sketch, by predicting the key points of the image relevant for pose estimation.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*u4A7dXom2aXOltrM9lFFnQ.png\" alt=\"\" width=\"700\" height=\"201\"><\/figure><div class=\"mf mg rp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*u4A7dXom2aXOltrM9lFFnQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*u4A7dXom2aXOltrM9lFFnQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*u4A7dXom2aXOltrM9lFFnQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*u4A7dXom2aXOltrM9lFFnQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*u4A7dXom2aXOltrM9lFFnQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*u4A7dXom2aXOltrM9lFFnQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*u4A7dXom2aXOltrM9lFFnQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*u4A7dXom2aXOltrM9lFFnQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"http:\/\/www-ens.iro.umontreal.ca\/~brodtkir\/projects\/sketch2pose\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch2Pose<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"ddd4\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Another popular sketch-based 3D vision task is sketch-based 3D shape retrieval (SBSR), which can be intuitively considered equivalent to the SBIR task (discussed above), except that the domain gap between 2D sketch and 3D shape is significantly greater than that between 2D sketch and 2D photo. Likewise, it has seen the use of end-to-end metric learning-based Siamese networks for learning cross-domain correspondences, such as the framework proposed in <a class=\"af mz\" href=\"https:\/\/www.cv-foundation.org\/openaccess\/content_cvpr_2015\/papers\/Wang_Sketch-Based_3D_Shape_2015_CVPR_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">this CVPR \u201915 paper<\/a>.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*_Wg6x9FRmdOrD0liiMi0Ug.png\" alt=\"\" width=\"700\" height=\"237\"><\/figure><div class=\"mf mg rq\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*_Wg6x9FRmdOrD0liiMi0Ug.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Siamese network architecture used for sketch-based 3D shape retrieval. Source: <a class=\"af mz\" href=\"https:\/\/www.cv-foundation.org\/openaccess\/content_cvpr_2015\/papers\/Wang_Sketch-Based_3D_Shape_2015_CVPR_paper.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"a209\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Recently, Qi et al<em class=\"nv\">.<\/em> proposed <a class=\"af mz\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9573376\" target=\"_blank\" rel=\"noopener ugc nofollow\">fine-grained SBSR<\/a> [TIP \u201821] as an extension of categorical SBSR to instance-level, where fine-grained sketches are provided as queries to retrieve a particular 3D shape among a gallery of shapes of the same class. Their work leveraged a novel cross-modal view attention mechanism to compute the best combination of planar projections of a 3D shape, given a query sketch.<\/p>\n<h2 id=\"6742\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Sketch-modulated Image Classifier<\/h2>\n<p id=\"eb7f\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">Few works have tried incorporating sketches to modulate the behaviour of classification models, such as converting a sketch classifier into a photo classifier, extending the ambit of a photo classifier so as to include fine-grained classes within a single class, and so on. The reason why sketches can be useful in these areas can be attributed to the cheap availability of free-hand sketches compared to exemplar class photos, as well as the fine-grained nature of sketches which can help <em class=\"nv\">partition the latent representation of a coarse class into finer ones<\/em>. Typically, the mechanism involves perturbation of model weights by passing a few sketch samples through it and re-training the model to adapt to the new conditions, i.e. in a few-shot manner.<\/p>\n<p id=\"db60\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">The paper <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1804.11182.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch-a-classifier<\/a> [CVPR \u201818] was the first work in this area, which proposed a model regression network to map from free-hand sketch space to the photo classifier embedding space. Their framework alleviated the use of labelled sketch-photo pairs and showed that such a cross-modal mapping can be done in a class-agnostic manner, i.e. new classes could be synthesized by users based on sketch inputs. Further, the authors also demonstrated a coarse-to-fine setup, where a photo classifier trained on coarse labels could be extended to accommodate fine-grained labels, the sketches serving as a guiding signal to determine the fine-grained category. Their pipeline is shown in the figure below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Vn99kzh_B6sXATa0uB9KWA.png\" alt=\"\" width=\"700\" height=\"401\"><\/figure><div class=\"mf mg rr\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*Vn99kzh_B6sXATa0uB9KWA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Vn99kzh_B6sXATa0uB9KWA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Vn99kzh_B6sXATa0uB9KWA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Vn99kzh_B6sXATa0uB9KWA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Vn99kzh_B6sXATa0uB9KWA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Vn99kzh_B6sXATa0uB9KWA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Vn99kzh_B6sXATa0uB9KWA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*Vn99kzh_B6sXATa0uB9KWA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1804.11182.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"659e\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">A more recent paper <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2203.14843.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">DIY-FSCIL<\/a> [CVPR \u201822] explored <a class=\"af mz\" href=\"https:\/\/github.com\/xialeiliu\/Awesome-Incremental-Learning\" target=\"_blank\" rel=\"noopener ugc nofollow\">class incremental learning<\/a> (CIL) to extend a previously trained <em class=\"nv\">N<\/em>-class photo classifier to an (<em class=\"nv\">N+k<\/em>)-class classifier, where representative sketch exemplars (typically 1) provided by the user serve as a support set for the new <em class=\"nv\">k<\/em> classes. This work leveraged cross-domain gradient consensus to ensure gradient space sketch-photo domain alignment, thereby generating a domain-agnostic feature extractor. However, one of the prominent challenges CIL faces is the preservation of knowledge of previous classes while learning the new ones (the effect commonly termed \u201c<em class=\"nv\">catastrophic forgetting<\/em>\u201d). To ensure this, the authors infused in their framework knowledge distillation and graph attention network-based message passing between old and new classes.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*eQEIG2mXTlhSFOVHeJtizQ.png\" alt=\"\" width=\"700\" height=\"402\"><\/figure><div class=\"mf mg rs\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*eQEIG2mXTlhSFOVHeJtizQ.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*eQEIG2mXTlhSFOVHeJtizQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*eQEIG2mXTlhSFOVHeJtizQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*eQEIG2mXTlhSFOVHeJtizQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*eQEIG2mXTlhSFOVHeJtizQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*eQEIG2mXTlhSFOVHeJtizQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*eQEIG2mXTlhSFOVHeJtizQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*eQEIG2mXTlhSFOVHeJtizQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2203.14843.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<h2 id=\"2f9c\" class=\"ot ny fo be nz ou ov ow oc ox oy oz of ni pa pb pc nm pd pe pf nq pg ph pi pj bj\" data-selectable-paragraph=\"\">Other Works<\/h2>\n<p id=\"a792\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">Having discussed several sub-domains of sketch-based computer vision, we now explore a few uniquely exciting works in recent times involving sketches:<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*FedP1tPZFVUKRdLX0CNMaA.png\" alt=\"\" width=\"700\" height=\"301\"><\/figure><div class=\"mf mg rt\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*FedP1tPZFVUKRdLX0CNMaA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*FedP1tPZFVUKRdLX0CNMaA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*FedP1tPZFVUKRdLX0CNMaA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*FedP1tPZFVUKRdLX0CNMaA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*FedP1tPZFVUKRdLX0CNMaA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*FedP1tPZFVUKRdLX0CNMaA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*FedP1tPZFVUKRdLX0CNMaA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*FedP1tPZFVUKRdLX0CNMaA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3414685.3417840\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"da74\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Bhunia et al.\u2019s <a class=\"af mz\" href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3414685.3417840\" target=\"_blank\" rel=\"noopener ugc nofollow\">Pixelor<\/a> (SIGGRAPH \u201920; figure shown above) presented a competitive AI sketching agent that exhibited human-level performance at generating recognizable sketches of objects. The recipe to their framework was at finding the most optimal sequence of strokes that can lead to an \u201cearly\u201d recognizable sketch. The authors employed <a class=\"af mz\" href=\"https:\/\/github.com\/dasayan05\/neuralsort-siggraph\" target=\"_blank\" rel=\"noopener ugc nofollow\">neural sorting<\/a> for the ordering of strokes, along with an improved <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/1704.03477.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Sketch-RNN<\/a> network by fusing the principles of <a class=\"af mz\" href=\"https:\/\/arxiv.org\/abs\/1711.01558\" target=\"_blank\" rel=\"noopener ugc nofollow\">Wasserstein autoencoder<\/a>, that leverages an <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/Transportation_theory_(mathematics)\" target=\"_blank\" rel=\"noopener ugc nofollow\">optimal transport<\/a> loss for aligning multi-modal optimal stroke sequence strategies.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*flIGYrFXs47peBCJX4UGtw.gif\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"mf mg ru\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*flIGYrFXs47peBCJX4UGtw.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*flIGYrFXs47peBCJX4UGtw.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*flIGYrFXs47peBCJX4UGtw.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*flIGYrFXs47peBCJX4UGtw.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*flIGYrFXs47peBCJX4UGtw.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*flIGYrFXs47peBCJX4UGtw.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*flIGYrFXs47peBCJX4UGtw.gif 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*flIGYrFXs47peBCJX4UGtw.gif 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*flIGYrFXs47peBCJX4UGtw.gif 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*flIGYrFXs47peBCJX4UGtw.gif 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*flIGYrFXs47peBCJX4UGtw.gif 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*flIGYrFXs47peBCJX4UGtw.gif 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*flIGYrFXs47peBCJX4UGtw.gif 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*flIGYrFXs47peBCJX4UGtw.gif 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/clipasso.github.io\/clipasso\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">CLIPasso<\/a><\/em><\/figcaption>\n<\/figure>\n<p id=\"82f1\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Vinker <em class=\"nv\">et al.<\/em>\u2019s <a class=\"af mz\" href=\"https:\/\/clipasso.github.io\/clipasso\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">CLIPasso<\/a> [SIGGRAPH \u201822] sought to leverage geometric and semantic simplifications for object sketching at various levels of abstraction. They considered sketches to be sets of <a class=\"af mz\" href=\"https:\/\/en.wikipedia.org\/wiki\/B%C3%A9zier_curve\" target=\"_blank\" rel=\"noopener ugc nofollow\">B\u00e9zier curves<\/a> and proposed an optimization mechanism on the curve parameters with respect to a <a class=\"af mz\" href=\"https:\/\/openai.com\/blog\/clip\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">CLIP<\/a>encoder-based perceptual loss. In order to enforce geometrical consistency between the generated sketch and the original object, the authors used L2 regularization between the intermediate layer outputs of the CLIP encoder from the image and sketch. Further, for a better initialization of the parametric curves constituting the sketch, a pre-trained saliency detection network was used to generate the saliency heatmap of the image, which is used as the distribution to sample the initial locations of the strokes. Their framework is shown below.<\/p>\n<figure class=\"mi mj mk ml mm mn mf mg paragraph-image\">\n<div class=\"mo mp eb mq bg mr\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ms mt c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*SSvxLZzDkJuQxAbI3JXpWA.png\" alt=\"\" width=\"700\" height=\"240\"><\/figure><div class=\"mf mg rv\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*SSvxLZzDkJuQxAbI3JXpWA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*SSvxLZzDkJuQxAbI3JXpWA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*SSvxLZzDkJuQxAbI3JXpWA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*SSvxLZzDkJuQxAbI3JXpWA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*SSvxLZzDkJuQxAbI3JXpWA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*SSvxLZzDkJuQxAbI3JXpWA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*SSvxLZzDkJuQxAbI3JXpWA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*SSvxLZzDkJuQxAbI3JXpWA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mu mv mw mf mg mx my be b bf z dv\" data-selectable-paragraph=\"\"><em>Source: <a class=\"af mz\" href=\"https:\/\/arxiv.org\/pdf\/2202.05822.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">Paper<\/a><\/em><\/figcaption>\n<\/figure>\n<h1 id=\"89eb\" class=\"nx ny fo be nz oa ob go oc od oe gr of og oh oi oj ok ol om on oo op oq or os bj\" data-selectable-paragraph=\"\">Conclusion<\/h1>\n<p id=\"17c9\" class=\"pw-post-body-paragraph na nb fo be b gm pk nd ne gp pl ng nh ni pm nk nl nm pn no np nq po ns nt nu fh bj\" data-selectable-paragraph=\"\">As fast-growing as it has been, sketch research has brought up several interesting applications and solutions to existing problems in computer graphics and vision. From sketch-based image retrieval to leveraging sketches for few-shot model adaptation, the efficacy of sketches across cross-domain tasks looks really promising for further research.<\/p>\n<p id=\"b29c\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">This article first discussed the motivation behind using sketches for vision tasks along with the unique set of characteristics and challenges they bring to the table, followed by a holistic dive into sketch representation learning and its current trends. We also explored several applications of sketches in computer vision, typically cross-modal tasks such as SBIR, sketch-photo synthesis and so on.<\/p>\n<p id=\"69ae\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\">Looking ahead, future directions of sketch-based vision worth exploring would constitute extending object-level sketches to scene-level for visual understanding, sketches for 3D vision (especially AR\/VR) applications, and so on. Thus, to conclude, the potential of sketches in commercial, artistic and multimedia applications is to be realized to its fullest, something that the vision community looks forward to.<\/p>\n<p id=\"248b\" class=\"pw-post-body-paragraph na nb fo be b gm nc nd ne gp nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu fh bj\" data-selectable-paragraph=\"\"><strong class=\"be pv\">Notes:<\/strong><\/p>\n<ol class=\"\">\n<li id=\"b603\" class=\"na nb fo be b gm nc nd ne gp nf ng nh pp nj nk nl pq nn no np pr nr ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\">An up-to-date collection of sketch-based computer vision works can be found at: <a class=\"af mz\" href=\"https:\/\/github.com\/MarkMoHR\/Awesome-Sketch-Based-Applications\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/github.com\/MarkMoHR\/Awesome-Sketch-Based-Applications<\/a><\/li>\n<li id=\"aea5\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\">Readers interested in an extensive and in-depth survey of sketch-based computer vision can check out this survey paper: <a class=\"af mz\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9706366?\" target=\"_blank\" rel=\"noopener ugc nofollow\">Deep Learning for Free-Hand Sketch: A Survey, IEEE TPAMI, 2022<\/a><\/li>\n<li id=\"8522\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\">Readers looking for a practical dive into creative AI involving sketches can check this out: <a class=\"af mz\" href=\"https:\/\/create.playform.io\/explore\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/create.playform.io\/explore<\/a><\/li>\n<li id=\"6d80\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\">For a live demo of <a class=\"af mz\" href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3414685.3417840\" target=\"_blank\" rel=\"noopener ugc nofollow\">Pixelor<\/a> (discussed above), interested readers may visit: <a class=\"af mz\" href=\"http:\/\/surrey.ac:9999\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">http:\/\/surrey.ac:9999\/<\/a><\/li>\n<li id=\"c77e\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu ps pt pu bj\" data-selectable-paragraph=\"\">Some of the research groups that publish regularly on sketch-oriented vision are:<\/li>\n<\/ol>\n<ul class=\"\">\n<li id=\"1123\" class=\"na nb fo be b gm nc nd ne gp nf ng nh pp nj nk nl pq nn no np pr nr ns nt nu rw pt pu bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"http:\/\/sketchx.ai\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">SketchX Laboratory, University of Surrey, Led by Prof. Yi-Zhe Song<\/a><\/li>\n<li id=\"af15\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu rw pt pu bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/www.surrey.ac.uk\/people\/john-collomosse\" target=\"_blank\" rel=\"noopener ugc nofollow\">Digital Creativity Lab, CVSSP, University of Surrey<\/a><\/li>\n<li id=\"8543\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu rw pt pu bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"http:\/\/geometry.cs.ucl.ac.uk\/index.php\" target=\"_blank\" rel=\"noopener ugc nofollow\">Smart Geometry Processing Group, University College London<\/a><\/li>\n<li id=\"6d36\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu rw pt pu bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/www.scm.cityu.edu.hk\/people\/fu-hongbo\" target=\"_blank\" rel=\"noopener ugc nofollow\">School of Creative Media, City University of Hong Kong<\/a><\/li>\n<li id=\"5d64\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu rw pt pu bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/github.com\/PRIS-CV\" target=\"_blank\" rel=\"noopener ugc nofollow\">PRIS-CV Group, School of Artificial Intelligence, BUPT<\/a><\/li>\n<li id=\"b706\" class=\"na nb fo be b gm pw nd ne gp px ng nh pp py nk nl pq pz no np pr qa ns nt nu rw pt pu bj\" data-selectable-paragraph=\"\"><a class=\"af mz\" href=\"https:\/\/team.inria.fr\/graphdeco\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">GraphDeco, INRIA<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Goashape on Unsplash Visual understanding of the real world is the primary goal of any computer vision system, which often requires the guidance of data modalities different from photos\/videos, which can better depict the visual cues, for better understanding and interpretation. In this regard, sketches provide what is perhaps the easiest mode of [&hellip;]<\/p>\n","protected":false},"author":77,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6],"tags":[],"coauthors":[174],"class_list":["post-7136","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Sketches in Computer Vision - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Sketches in Computer Vision\" \/>\n<meta property=\"og:description\" content=\"Photo by Goashape on Unsplash Visual understanding of the real world is the primary goal of any computer vision system, which often requires the guidance of data modalities different from photos\/videos, which can better depict the visual cues, for better understanding and interpretation. In this regard, sketches provide what is perhaps the easiest mode of [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-14T13:17:46+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:14:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg\" \/>\n<meta name=\"author\" content=\"Soumitri Chattopadhyay\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Soumitri Chattopadhyay\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"25 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Sketches in Computer Vision - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/","og_locale":"en_US","og_type":"article","og_title":"Sketches in Computer Vision","og_description":"Photo by Goashape on Unsplash Visual understanding of the real world is the primary goal of any computer vision system, which often requires the guidance of data modalities different from photos\/videos, which can better depict the visual cues, for better understanding and interpretation. In this regard, sketches provide what is perhaps the easiest mode of [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-14T13:17:46+00:00","article_modified_time":"2025-04-24T17:14:44+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg","type":"","width":"","height":""}],"author":"Soumitri Chattopadhyay","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Soumitri Chattopadhyay","Est. reading time":"25 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/"},"author":{"name":"Soumitri Chattopadhyay","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/d85d61a46fe6de3bc920de6cd414589e"},"headline":"Sketches in Computer Vision","datePublished":"2023-08-14T13:17:46+00:00","dateModified":"2025-04-24T17:14:44+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/"},"wordCount":4204,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/","url":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/","name":"Sketches in Computer Vision - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg","datePublished":"2023-08-14T13:17:46+00:00","dateModified":"2025-04-24T17:14:44+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*TlnfkLCD8egXCwGm-OYTvQ.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/sketches-in-computer-vision\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Sketches in Computer Vision"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/d85d61a46fe6de3bc920de6cd414589e","name":"Soumitri Chattopadhyay","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/4d04f8262ce81e6c5178faf5d6532436","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1688025620454-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1688025620454-96x96.jpg","caption":"Soumitri Chattopadhyay"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/soumitri-chattopadhyaygmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7136","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/77"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7136"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7136\/revisions"}],"predecessor-version":[{"id":15579,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7136\/revisions\/15579"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7136"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7136"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7136"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7136"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}