{"id":7335,"date":"2023-08-29T13:13:53","date_gmt":"2023-08-29T21:13:53","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7335"},"modified":"2025-04-24T17:14:32","modified_gmt":"2025-04-24T17:14:32","slug":"deep-learning-based-video-summarization-a-detailed-exploration","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/","title":{"rendered":"Deep learning-based video summarization \u2014 A detailed exploration"},"content":{"rendered":"\n<div class=\"fh fi fj fk fl\">\n<div class=\"mg bg\">\n<figure class=\"mh mi mj mk ml mg bg paragraph-image\"><picture><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg\" alt=\"person holding a movie clapperboard\" width=\"2400\" height=\"1690\"><\/picture><figcaption class=\"mo mp mq mr ms mt mu be b bf z dv\" data-selectable-paragraph=\"\">Photo by <a class=\"af mv\" href=\"https:\/\/unsplash.com\/@jakobowens1?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Jakob Owens<\/a> on <a class=\"af mv\" href=\"https:\/\/unsplash.com\/s\/photos\/video?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p data-selectable-paragraph=\"\">\n<\/p><p id=\"d28e\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">With the immense growth of videos on the internet, it\u2019s become really difficult to efficiently search amongst millions of them. When searching for an event query, users are often bewildered by the vast quantity of videos returned by search engines like Google. Exploring such results can be time-consuming and can also degrade the user experience. Hence, we\u2019ll be discussing ways to automate this process with deep learning <strong class=\"be nr\">video summarization <\/strong>techniques.<\/p>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<div class=\"nt nu eb nv bg nw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*MFRzeL1NBv-gSfmc-isovg.jpeg\" alt=\"lots of images in a grid\" width=\"700\" height=\"395\"><\/figure><div class=\"mr ms ns\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*MFRzeL1NBv-gSfmc-isovg.jpeg 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mo mp mq mr ms mt mu be b bf z dv\" data-selectable-paragraph=\"\">The vast amount of videos found on the internet<\/figcaption>\n<\/figure>\n<h2 id=\"a3fb\" class=\"nx ny fo be nz oa ob go oc od oe gr of og oh oi oj ok ol om on oo op oq or os bj\">The Definition of Video Summarization<\/h2>\n<blockquote class=\"ot ou ov\"><p id=\"e8df\" class=\"mw mx ow be b gm my mz na gp nb nc nd ox nf ng nh oy nj nk nl oz nn no np nq fh bj\" data-selectable-paragraph=\"\"><em class=\"fo\">\u201c<\/em>Video summarization is the process of distilling a raw video into a more compact form without losing much information<em class=\"fo\">.\u201d<\/em><\/p><p id=\"c1a6\" class=\"mw mx ow be b gm my mz na gp nb nc nd ox nf ng nh oy nj nk nl oz nn no np nq fh bj\" data-selectable-paragraph=\"\">\u2014 Definition taken from the research paper \u201c<a class=\"af mv\" href=\"https:\/\/www.semanticscholar.org\/paper\/Video-Summarization-via-Semantic-Attended-Networks-Wei-Ni\/6ec09bad57cc81a71ef7596f57e94ee13b380ae3\" target=\"_blank\" rel=\"noopener ugc nofollow\">Video Summarization via Semantic Attended Networks<\/a>\u201d by Shanghai Jiao Tong University<\/p><\/blockquote>\n<p id=\"b48d\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">Video summarization helps users to navigate through a large sequence of videos and retrieve ones that are most relevant to the query.<\/p>\n<p id=\"0b78\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">In a general video summarization system, image features of video frames are extracted, and then the most representative frames are selected through analyzing the visual variations among visual features.<\/p>\n<p id=\"a8e5\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">This is done either by taking a holistic view of the entire video or by identifying the local differentiation among the adjacent frames. Most of those attempts rely on global features such as color, texture, motion information, etc. Clustering techniques are also used for summarization.<\/p>\n<p id=\"5122\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">Video summarization can be categorized into two forms:<\/p>\n<ol class=\"\">\n<li id=\"e0b0\" class=\"mw mx fo be b gm my mz na gp nb nc nd ox nf ng nh oy nj nk nl oz nn no np nq pa pb pc bj\" data-selectable-paragraph=\"\">Static video summarization (keyframing) and<\/li>\n<li id=\"f099\" class=\"mw mx fo be b gm pd mz na gp pe nc nd ox pf ng nh oy pg nk nl oz ph no np nq pa pb pc bj\" data-selectable-paragraph=\"\">Dynamic video summarization (video skimming)<\/li>\n<\/ol>\n<p id=\"8f53\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">Static video summaries are composed of a set of keyframes extracted from the original video, while dynamic video summaries are composed of a set of shots and are produced taking into account the similarity or domain-specific relationships among all video shots.<\/p>\n<p id=\"7f13\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">One advantage of a video skim over a keyframe set is the ability to include audio and motion elements that potentially enhance both the expressiveness and the amount of information conveyed by the summary. In addition, it\u2019s often more entertaining and interesting to watch a skim than a slide show of keyframes.<\/p>\n<p id=\"46f1\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">On the other hand, keyframe sets are not restricted by any timing or synchronization issues, and therefore, they offer much more flexibility in terms of organization for browsing and navigation purposes, in comparison to a strict sequential display of video skims.<\/p>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:689\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg\" alt=\"The Process of Static Video Summary Composition\" width=\"689\" height=\"368\"><\/figure><div class=\"mr ms pi\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1378\/format:webp\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 1378w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 689px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1378\/1*kAyFXHQ0M602IIE0iI7XlQ.jpeg 1378w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 689px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mo mp mq mr ms mt mu be b bf z dv\" data-selectable-paragraph=\"\">The Process of Static Video Summary Composition<\/figcaption>\n<\/figure>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<div class=\"nt nu eb nv bg nw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*9SKIaFh7Edkpi31xsCbOzA.png\" alt=\"The Process of Dynamic Video Summarization\" width=\"700\" height=\"304\"><\/figure><div class=\"mr ms pj\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*9SKIaFh7Edkpi31xsCbOzA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*9SKIaFh7Edkpi31xsCbOzA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*9SKIaFh7Edkpi31xsCbOzA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*9SKIaFh7Edkpi31xsCbOzA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*9SKIaFh7Edkpi31xsCbOzA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*9SKIaFh7Edkpi31xsCbOzA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*9SKIaFh7Edkpi31xsCbOzA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*9SKIaFh7Edkpi31xsCbOzA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mo mp mq mr ms mt mu be b bf z dv\" data-selectable-paragraph=\"\">The Process of Dynamic Video Summarization<\/figcaption>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"37de\" class=\"nx ny fo be nz oa qc go oc od qd gr of og qe oi oj ok qf om on oo qg oq or os bj\">Video Summarization Techniques<\/h2>\n<h3 id=\"da2d\" class=\"qh ny fo be nz qi qj qk oc ql qm qn of ne qo qp qq ni qr qs qt nm qu qv qw qx bj\"><strong class=\"al\">Feature-Based Video Summarization<\/strong><\/h3>\n<p id=\"5a82\" class=\"pw-post-body-paragraph mw mx fo be b gm qy mz na gp qz nc nd ne ra ng nh ni rb nk nl nm rc no np nq fh bj\" data-selectable-paragraph=\"\">The digital video contains many features like color, motion, voice, etc. These techniques work well if a user wants to focus on the features of the video. For example, if a user wants to see color features, then it\u2019s good to pick color-based video summarization techniques.<\/p>\n<p id=\"f751\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">Feature-based video summarization techniques are classified on the basis of motion, color, dynamic contents, gesture, audio-visual, speech transcripts, objects, etc.<\/p>\n<p id=\"d140\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">If you want to know more about this technique, click <a class=\"af mv\" href=\"https:\/\/sci-hub.tw\/https:\/\/link.springer.com\/chapter\/10.1007\/978-3-642-33564-8_1\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:676\/1*oK77l3-bBd6jELKNQlko-A.png\" alt=\"graphic of original video sequence going through a summarization process\" width=\"676\" height=\"507\"><\/figure><div class=\"mr ms rd\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1352\/format:webp\/1*oK77l3-bBd6jELKNQlko-A.png 1352w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 676px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*oK77l3-bBd6jELKNQlko-A.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*oK77l3-bBd6jELKNQlko-A.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*oK77l3-bBd6jELKNQlko-A.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*oK77l3-bBd6jELKNQlko-A.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*oK77l3-bBd6jELKNQlko-A.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*oK77l3-bBd6jELKNQlko-A.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1352\/1*oK77l3-bBd6jELKNQlko-A.png 1352w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 676px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<h3 id=\"b456\" class=\"qh ny fo be nz qi qj qk oc ql qm qn of ne qo qp qq ni qr qs qt nm qu qv qw qx bj\"><strong class=\"al\">Video Summarization Using Clustering<\/strong><\/h3>\n<p id=\"25f9\" class=\"pw-post-body-paragraph mw mx fo be b gm qy mz na gp qz nc nd ne ra ng nh ni rb nk nl nm rc no np nq fh bj\" data-selectable-paragraph=\"\">Clustering is the most frequently used technique when we encounter similar characteristics or activities within a frame. It also helps to eliminate those frames that have irregular trends. Other methods for video summarization enable a more efficient way of browsing video but also create summaries that are either too long or confusing. Video summarization based on clustering is classified into similar activities, K-means, partitioned clustering, and spectral clustering.<\/p>\n<p id=\"3a8d\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">If you want to know more about this technique, click <a class=\"af mv\" href=\"https:\/\/sci-hub.tw\/https:\/\/link.springer.com\/chapter\/10.1007\/978-3-642-33564-8_1\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<div class=\"nt nu eb nv bg nw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*reITHOmgA76A47PCIPWCUg.png\" alt=\"graphic of video summarization using clustering\" width=\"700\" height=\"376\"><\/figure><div class=\"mr ms re\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*reITHOmgA76A47PCIPWCUg.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*reITHOmgA76A47PCIPWCUg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*reITHOmgA76A47PCIPWCUg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*reITHOmgA76A47PCIPWCUg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*reITHOmgA76A47PCIPWCUg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*reITHOmgA76A47PCIPWCUg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*reITHOmgA76A47PCIPWCUg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*reITHOmgA76A47PCIPWCUg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h3 id=\"2afc\" class=\"qh ny fo be nz qi qj qk oc ql qm qn of ne qo qp qq ni qr qs qt nm qu qv qw qx bj\"><strong class=\"al\">Tag Localization and Key-Shot Identification Approach<\/strong><\/h3>\n<p id=\"20c9\" class=\"pw-post-body-paragraph mw mx fo be b gm qy mz na gp qz nc nd ne ra ng nh ni rb nk nl nm rc no np nq fh bj\" data-selectable-paragraph=\"\">Millions of videos are available on the web with rich metadata, such as titles, comments, and tags. Therefore, recent efforts have been put into searching or exploring the tag information of web videos.<\/p>\n<p id=\"7ad6\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">Specifically, there\u2019s a scheme that enriches YouTube videos\u2019 tag information by exploring their redundancy, such as overlapping or duplicated content. They build a graph for a set of videos, and the tags from redundant videos can be propagated to the target video through the graph structures.<\/p>\n<p id=\"f50c\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">A bag-of-instances model is used to perform tag localization (as a note, the mathematics behind it is beyond the scope of this article). Additionally, key-shot identification is performed based on an assumption that videos usually appear multiple times in search results.<\/p>\n<p id=\"cdb4\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">Therefore, this kind of identification can be accomplished by near-duplicate detection, i.e. performing near-duplicate detection using the keyframe pairs extracted from different web videos. Since near-duplicate keyframes usually have a small difference, the clustering-based method can be used to speed up the key-shot identification process.<\/p>\n<p id=\"785f\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">If you want to know more about this technique, click <a class=\"af mv\" href=\"https:\/\/sci-hub.tw\/https:\/\/ieeexplore.ieee.org\/abstract\/document\/6135507\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/p>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<div class=\"nt nu eb nv bg nw\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*a6kllQ_5C8Beq8kBdHBiTA.png\" alt=\"video summarization using Tag Localization and Key-Shot Identification\" width=\"700\" height=\"271\"><\/figure><div class=\"mr ms rf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*a6kllQ_5C8Beq8kBdHBiTA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*a6kllQ_5C8Beq8kBdHBiTA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*a6kllQ_5C8Beq8kBdHBiTA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*a6kllQ_5C8Beq8kBdHBiTA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*a6kllQ_5C8Beq8kBdHBiTA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*a6kllQ_5C8Beq8kBdHBiTA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*a6kllQ_5C8Beq8kBdHBiTA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*a6kllQ_5C8Beq8kBdHBiTA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<h3 id=\"f741\" class=\"qh ny fo be nz qi qj qk oc ql qm qn of ne qo qp qq ni qr qs qt nm qu qv qw qx bj\"><strong class=\"al\">Bag-of-Importance Model<\/strong><\/h3>\n<p id=\"9d85\" class=\"pw-post-body-paragraph mw mx fo be b gm qy mz na gp qz nc nd ne ra ng nh ni rb nk nl nm rc no np nq fh bj\" data-selectable-paragraph=\"\">A video can be viewed as a collection of weighted features instead of equally-important ones. The BoI model provides a mechanism to exploit both inter-frame and intra-frame properties by quantifying the importance of the individual features representing the whole video.<\/p>\n<p id=\"24ba\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">The representative frames hence can be identified by aggregating the weighted features. It\u2019s very reasonable to assume that a video sequence in its raw feature space is a dense manifold. In order to remove redundant visual features, a video sequence needs to be projected to a low-dimensional sparse space. The locality-constrained linear coding method provides such a mechanism, which can take advantage of the manifold geometric structure to learn a nonlinear function in a high dimensional space\/manifold, and locally embed the points on the manifold in a lower-dimensional space, expressed as the coordinates with respect to a set of anchor points.<\/p>\n<p id=\"b89b\" class=\"pw-post-body-paragraph mw mx fo be b gm my mz na gp nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq fh bj\" data-selectable-paragraph=\"\">More about this topic is discussed in the given <a class=\"af mv\" href=\"https:\/\/sci-hub.tw\/https:\/\/ieeexplore.ieee.org\/abstract\/document\/6804698\" target=\"_blank\" rel=\"noopener ugc nofollow\">paper<\/a>.<\/p>\n<figure class=\"mh mi mj mk ml mg mr ms paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mm mn c alignnone\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*1JFxuViSCJ4Eq81_gfohdw.png\" alt=\"Bag-of-Importance Model graphic\" width=\"640\" height=\"502\"><\/figure><div class=\"mr ms rg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/format:webp\/1*1JFxuViSCJ4Eq81_gfohdw.png 1280w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*1JFxuViSCJ4Eq81_gfohdw.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*1JFxuViSCJ4Eq81_gfohdw.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*1JFxuViSCJ4Eq81_gfohdw.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*1JFxuViSCJ4Eq81_gfohdw.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*1JFxuViSCJ4Eq81_gfohdw.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*1JFxuViSCJ4Eq81_gfohdw.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1280\/1*1JFxuViSCJ4Eq81_gfohdw.png 1280w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 640px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<h2 id=\"0a52\" class=\"nx ny fo be nz oa ob go oc od oe gr of og oh oi oj ok ol om on oo op oq or os bj\">Conclusion<\/h2>\n<p id=\"801b\" class=\"pw-post-body-paragraph mw mx fo be b gm qy mz na gp qz nc nd ne ra ng nh ni rb nk nl nm rc no np nq fh bj\" data-selectable-paragraph=\"\">These techniques are just the start of a new era in deep learning technology when it comes to video summarization. Many advances will be made in the near future to create and optimize the best summaries based on the audience, delivery medium, and intent of summarization. Together, with efforts across the industry, we\u2019ll make video summarization highly-scalable, reliable, and incredibly efficient.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Jakob Owens on Unsplash With the immense growth of videos on the internet, it\u2019s become really difficult to efficiently search amongst millions of them. When searching for an event query, users are often bewildered by the vast quantity of videos returned by search engines like Google. Exploring such results can be time-consuming and [&hellip;]<\/p>\n","protected":false},"author":81,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[178],"class_list":["post-7335","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep learning-based video summarization \u2014 A detailed exploration<\/title>\n<meta name=\"description\" content=\"Learn how to automate the process of finding videos in search results with deep learning video summarization techniques.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep learning-based video summarization \u2014 A detailed exploration\" \/>\n<meta property=\"og:description\" content=\"Learn how to automate the process of finding videos in search results with deep learning video summarization techniques.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-29T21:13:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:14:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg\" \/>\n<meta name=\"author\" content=\"Surya Remanan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Surya Remanan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Deep learning-based video summarization \u2014 A detailed exploration","description":"Learn how to automate the process of finding videos in search results with deep learning video summarization techniques.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/","og_locale":"en_US","og_type":"article","og_title":"Deep learning-based video summarization \u2014 A detailed exploration","og_description":"Learn how to automate the process of finding videos in search results with deep learning video summarization techniques.","og_url":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-29T21:13:53+00:00","article_modified_time":"2025-04-24T17:14:32+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg","type":"","width":"","height":""}],"author":"Surya Remanan","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Surya Remanan","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/"},"author":{"name":"Surya Remanan","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/72eb01e500d83a12de3ca994d798c13f"},"headline":"Deep learning-based video summarization \u2014 A detailed exploration","datePublished":"2023-08-29T21:13:53+00:00","dateModified":"2025-04-24T17:14:32+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/"},"wordCount":966,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/","url":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/","name":"Deep learning-based video summarization \u2014 A detailed exploration","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg","datePublished":"2023-08-29T21:13:53+00:00","dateModified":"2025-04-24T17:14:32+00:00","description":"Learn how to automate the process of finding videos in search results with deep learning video summarization techniques.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:2500\/1*l4ysrufjs3Jffox5DJIr8w.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/deep-learning-based-video-summarization-a-detailed-exploration\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Deep learning-based video summarization \u2014 A detailed exploration"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/72eb01e500d83a12de3ca994d798c13f","name":"Surya Remanan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/79a22e9a7a8f64dba9ff7256a08bd0d5","url":"https:\/\/secure.gravatar.com\/avatar\/2cfa0492b6f6174667ae95bf9f031a70e3e6040f3fea0bdf06fb45478294ecd2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2cfa0492b6f6174667ae95bf9f031a70e3e6040f3fea0bdf06fb45478294ecd2?s=96&d=mm&r=g","caption":"Surya Remanan"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/ksuryaremanangmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7335","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/81"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7335"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7335\/revisions"}],"predecessor-version":[{"id":15568,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7335\/revisions\/15568"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7335"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7335"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7335"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7335"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}