{"id":4697,"date":"2022-11-16T10:53:34","date_gmt":"2022-11-16T18:53:34","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=4697"},"modified":"2025-04-24T17:16:23","modified_gmt":"2025-04-24T17:16:23","slug":"kangas-visualize-multimedia-data-at-scale","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/","title":{"rendered":"Kangas: Visualize Multimedia Data at Scale"},"content":{"rendered":"\n<p><span style=\"font-weight: 400;\">Thousands of data scientists use Comet <\/span><a href=\"https:\/\/www.comet.com\/site\/blog\/introducing-panels-custom-visualizations-for-machine-learning\/\"><span style=\"font-weight: 400;\">panels<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/www.comet.com\/site\/blog\/logging-histograms-gradients-and-activations-with-comet\/\"><span style=\"font-weight: 400;\">histograms<\/span><\/a><span style=\"font-weight: 400;\">, and <\/span><a href=\"https:\/\/www.comet.com\/site\/blog\/introducing-reports-ml-templates\/\"><span style=\"font-weight: 400;\">reports<\/span><\/a><span style=\"font-weight: 400;\"> to visualize data from experiments every day. While we\u2019re proud of those tools and excited to see teams using them, we\u2019ve consistently heard one piece of feedback, particularly from computer vision researchers:<\/span><\/p>\n\n\n\n<p><b>Visualization is still painful in exploratory data analysis.<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Over the last several months, the Comet research team has been working on addressing this problem, developing a tool for visualizing multimedia data that is performant, intuitive, and interoperable. Today, we are excited to open source this library, <\/span><a href=\"https:\/\/github.com\/comet-ml\/kangas\"><span style=\"font-weight: 400;\">Kangas<\/span><\/a><span style=\"font-weight: 400;\">, and release it for its initial beta.<\/span><\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Introducing Kangas V1: Open Source EDA for Computer Vision<\/span><\/h1>\n\n\n\n<p><span style=\"font-weight: 400;\">In these early days of Kangas (like \u201ckangaroos\u201d without the \u201croo\u201d), we\u2019ve set out to solve three specific problems in exploratory data analysis:<\/span><\/p>\n\n\n\n<p><b>1. Large datasets are painful to process.<\/b><span style=\"font-weight: 400;\"> While pandas is a fantastic tool, it stores its DataFrames in memory, crippling performance as your dataset grows. Supplementing with 3rd party tools, like <\/span><a href=\"https:\/\/docs.dask.org\/en\/latest\/dataframe.html\"><span style=\"font-weight: 400;\">Dask<\/span><\/a><span style=\"font-weight: 400;\">, works in a complex pipeline ahead of production, but slows you down in research.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This is where we started with Kangas. We thought \u201cWhat if instead of storing a DataFrame-like object in memory, we stored it in an actual database?\u201d Which then transformed into \u201cWhat if DataFrames <\/span><i><span style=\"font-weight: 400;\">were<\/span><\/i><span style=\"font-weight: 400;\"> actual databases?\u201d&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">The base class of Kangas is the DataGrid, which you define using a familiar Python syntax:<\/span><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">from kangas import DataGrid\ndg = DataGrid(name=\"Images\", columns=[\"Image\", \"Score\"])\ndg.append([image_1, score_1])\ndg.show()<\/pre>\n\n\n\n<p><i><span style=\"font-weight: 400;\">Note: There are actually several different ways of constructing a DataGrid. For more, <\/span><\/i><a href=\"https:\/\/github.com\/comet-ml\/kangas\/wiki\/Constructing-DataGrids\"><i><span style=\"font-weight: 400;\">see here<\/span><\/i><\/a><i><span style=\"font-weight: 400;\">.<\/span><\/i><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">A Kangas DataGrid is an actual SQLite database, giving it the ability to store vast amounts of data and perform complex queries quickly. It also allows DataGrids to be saved and distributed, even served remotely.<\/span><\/p>\n\n\n\n<p><b>2. Visualizing data takes hours<\/b><span style=\"font-weight: 400;\">. To explore a CV dataset, you need to see the images themselves, as well as the relevant metadata and transformations. You need to be able to compare images across views, chart aggregate statistics, and ideally, do it all inside a single UI. Your typical mishmash of libraries results in output best described as \u201cfunctional,\u201d not beautiful.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Visualizations in Kangas needed to be easy, fast, and slick. Instead of relying on a Python library, we built the Kangas UI as an actual web application. Server side rendering (using React Server Components), allows Kangas to render visualizations quickly while performing a variety of queries, including filtering, sorting, grouping, and reordering columns.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/github.com\/caleb-kaiser\/kangas_examples\/blob\/master\/Nov-16-2022%2007-16-56.gif?raw=true\" alt=\"Kangas Demo\" class=\"wp-image-4699\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">On top of this, Kangas provides built-in metadata parsing for things like labels, scores, and bounding boxes:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/github.com\/caleb-kaiser\/kangas_examples\/blob\/master\/Oct-25-2022%2016-43-56.gif?raw=true\" alt=\"Kangas Demo\" class=\"wp-image-4699\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><b style=\"font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol';\">3. EDA solutions are rarely interoperable<\/b><span style=\"font-weight: 400;\">. One of the challenges of EDA is that data is often messy and unpredictable. Your colleague\u2019s \u201ceccentric\u201d preference in tooling often changes your data in the least intuitive way. In an ideal world, you wouldn\u2019t need to change your workflow to contend with this variability\u2014it would all just work. To achieve this in Kangas, we had to do several things.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">First, we wanted to make sure that any type of data could be loaded into Kangas. To this end, Kangas is largely unopinionated about what you store inside a DataGrid. Kangas additionally provides several constructor methods for ingesting data from different sources, including pandas DataFrames, CSV files, and existing DataGrids.<\/span><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import kangas as kg\n\n# Load an existing DataGrid\ndg = kg.read_datagrid(\"https:\/\/github.com\/caleb-kaiser\/kangas_examples\/raw\/master\/coco-500.datagrid\")\n\n# Build a DataGrid from a CSV\ndg = kg.read_csv(\"\/path\/to\/your.csv\")\n\n# Build a DataGrid from a Pandas DataFrame\ndg = kg.read_dataframe(your_dataframe)\n\n# Construct a DataGrid manually\ndg = kg.DataGrid(name=\"Example 1\", columns=[\"Category\", \"Loss\", \"Fitness\", \"Timestamp\"])\n<\/pre>\n\n\n\n<p><span style=\"font-weight: 400;\">Secondly, we wanted to be sure that Kangas could run in any environment without major setup. Once you\u2019ve run `pip install kangas`, you can run it as a standalone app on your local machine, from within a notebook environment, or even deployed on its own server (as we\u2019ve done at <\/span><a href=\"https:\/\/kangas.comet.com\/?datagrid=\/data\/coco-500.datagrid\"><span style=\"font-weight: 400;\">kangas.comet.com<\/span><\/a><span style=\"font-weight: 400;\">.)<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Finally, the fact that Kangas is open source means it is by definition interoperable. If your particular needs are so specific and extreme that nothing on the Kangas roadmap will ever satisfy them, you are able to fork the repo and implement whatever you need. And if you do that, please let us know! We\u2019d love to take a look.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">What\u2019s on the roadmap for Kangas?<\/span><\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">It\u2019s still early days for Kangas. Right now, there are only a handful of beta users testing it, and large portions of the codebase are still under active development. With that in mind, what happens next is largely up to you. Kangas is and always will be a free and open source project, and what we choose to prioritize over the next months and years will come down to what members of the community want the most.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">If you have time to spare and a burning need for better exploratory data analysis, consider stopping by the <\/span><a href=\"https:\/\/github.com\/comet-ml\/kangas\"><span style=\"font-weight: 400;\">Kangas repo<\/span><\/a><span style=\"font-weight: 400;\"> and taking it for a spin. We\u2019re open to community contributions of all kinds, and if you star\/follow the repository, you\u2019ll get updated whenever there is a new major release.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Thousands of data scientists use Comet panels, histograms, and reports to visualize data from experiments every day. While we\u2019re proud of those tools and excited to see teams using them, we\u2019ve consistently heard one piece of feedback, particularly from computer vision researchers: Visualization is still painful in exploratory data analysis. Over the last several months, [&hellip;]<\/p>\n","protected":false},"author":25,"featured_media":4703,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[8,9],"tags":[],"coauthors":[142],"class_list":["post-4697","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-comet-community-hub","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Kangas: Visualize Multimedia Data at Scale - Comet<\/title>\n<meta name=\"description\" content=\"Kangas is an open source tool for performing exploratory data analysis on computer vision datasets with a UI that is performant, intuitive, and interoperable.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Kangas: Visualize Multimedia Data at Scale\" \/>\n<meta property=\"og:description\" content=\"Kangas is an open source tool for performing exploratory data analysis on computer vision datasets with a UI that is performant, intuitive, and interoperable.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-16T18:53:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:16:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/11\/kangas-datagrid.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"640\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Caleb Kaiser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@KaiserFrose\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Caleb Kaiser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Kangas: Visualize Multimedia Data at Scale - Comet","description":"Kangas is an open source tool for performing exploratory data analysis on computer vision datasets with a UI that is performant, intuitive, and interoperable.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/","og_locale":"en_US","og_type":"article","og_title":"Kangas: Visualize Multimedia Data at Scale","og_description":"Kangas is an open source tool for performing exploratory data analysis on computer vision datasets with a UI that is performant, intuitive, and interoperable.","og_url":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2022-11-16T18:53:34+00:00","article_modified_time":"2025-04-24T17:16:23+00:00","og_image":[{"width":1280,"height":640,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/11\/kangas-datagrid.png","type":"image\/png"}],"author":"Caleb Kaiser","twitter_card":"summary_large_image","twitter_creator":"@KaiserFrose","twitter_site":"@Cometml","twitter_misc":{"Written by":"Caleb Kaiser","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/"},"author":{"name":"Caleb Kaiser","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/baa7ccdd5a25dfa5618749d6c504d203"},"headline":"Kangas: Visualize Multimedia Data at Scale","datePublished":"2022-11-16T18:53:34+00:00","dateModified":"2025-04-24T17:16:23+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/"},"wordCount":789,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/11\/kangas-datagrid.png","articleSection":["Comet Community Hub","Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/","url":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/","name":"Kangas: Visualize Multimedia Data at Scale - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/11\/kangas-datagrid.png","datePublished":"2022-11-16T18:53:34+00:00","dateModified":"2025-04-24T17:16:23+00:00","description":"Kangas is an open source tool for performing exploratory data analysis on computer vision datasets with a UI that is performant, intuitive, and interoperable.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/11\/kangas-datagrid.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/11\/kangas-datagrid.png","width":1280,"height":640,"caption":"Kangas Logo"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/kangas-visualize-multimedia-data-at-scale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Kangas: Visualize Multimedia Data at Scale"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/baa7ccdd5a25dfa5618749d6c504d203","name":"Caleb Kaiser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/3a75e34ba4e2ba18dd960aae0d6d022a","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/01\/cropped-Caleb-Kaiser-96x96.jpeg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/01\/cropped-Caleb-Kaiser-96x96.jpeg","caption":"Caleb Kaiser"},"sameAs":["https:\/\/x.com\/KaiserFrose"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/calebcomet-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4697","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/25"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=4697"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4697\/revisions"}],"predecessor-version":[{"id":15643,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/4697\/revisions\/15643"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/4703"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=4697"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=4697"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=4697"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=4697"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}