{"id":6559,"date":"2023-06-29T11:23:23","date_gmt":"2023-06-29T19:23:23","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=6559"},"modified":"2025-04-24T17:15:17","modified_gmt":"2025-04-24T17:15:17","slug":"real-time-object-detection-on-raspberry-pi-using-opencv-dnn","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/","title":{"rendered":"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN"},"content":{"rendered":"\n<div class=\"fh fi fj fk fl\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg\" alt=\"\" width=\"2400\" height=\"3744\"><\/figure><div class=\"mf bg\">\n<figure class=\"mg mh mi mj mk mf bg paragraph-image\"><picture><\/picture><\/figure>\n<\/div>\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"3477\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Running deep learning models is computationally expensive. And when it comes to image processing with <a class=\"af ni\" href=\"https:\/\/heartbeat.comet.ml\/the-5-computer-vision-techniques-that-will-change-how-you-see-the-world-1ee19334354b\" target=\"_blank\" rel=\"noopener ugc nofollow\">computer vision,<\/a> the first thing that comes to mind is high-end GPUs\u2014think the 1080ti and now the 2080ti.<\/p>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:480\/0*jRkVZ7PX-uznfHsM.jpg\" alt=\"\" width=\"480\" height=\"320\"><\/figure><div class=\"nj nk nl\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:960\/format:webp\/0*jRkVZ7PX-uznfHsM.jpg 960w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 480px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*jRkVZ7PX-uznfHsM.jpg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*jRkVZ7PX-uznfHsM.jpg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*jRkVZ7PX-uznfHsM.jpg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*jRkVZ7PX-uznfHsM.jpg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*jRkVZ7PX-uznfHsM.jpg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*jRkVZ7PX-uznfHsM.jpg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:960\/0*jRkVZ7PX-uznfHsM.jpg 960w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 480px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p>But it\u2019s hard to run computer vision models on edge devices like Raspberry Pi, and making a portable solution is difficult with deep learning libraries like <a href=\"https:\/\/medium.com\/u\/b1d410cb9700?source=post_page-----98827255fa60--------------------------------\">TensorFlow<\/a> or <a href=\"https:\/\/medium.com\/r?url=https%3A%2F%2Fheartbeat.fritz.ai%2Fintroduction-to-pytorch-for-deep-learning-5b437cea90ac\">PyTorch<\/a>.<\/p>\n<p>For this task, it\u2019s almost compulsory to add <a class=\"af ni\" href=\"https:\/\/opencv.org\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">OpenCV<\/a> to help pre-process data. And the good news is that OpenCV itself includes a deep neural network module, known as <a class=\"af ni\" href=\"https:\/\/docs.opencv.org\/3.4\/d2\/d58\/tutorial_table_of_content_dnn.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">OpenCV DNN<\/a>. It runs much faster than other libraries, and conveniently, it only needs OpenCV in the environment. As a result, OpenCV DNN can run on a CPU\u2019s computational power with great speed.<\/p>\n<p id=\"6b1f\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">The best use case of OpenCV DNN is performing real-time object detection on a Raspberry Pi. This process can run in any environment where OpenCV can be installed and doesn&#8217;t depend on the hassle of installing deep learning libraries with GPU support. As such, this tutorial isn\u2019t centered on Raspberry Pi\u2014you can follow this process for any environment with OpenCV.<\/p>\n<h1 id=\"f50f\" class=\"np nq fo be nr ns nt go nu nv nw gr nx ny nz oa ob oc od oe of og oh oi oj ok bj\" data-selectable-paragraph=\"\">How Does Object Detection with OpenCV DNN Work?<\/h1>\n<p id=\"32f5\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">Previously, I wrote this piece:<\/p>\n<blockquote class=\"oq or os\"><p id=\"6e8f\" class=\"mn mo ot be b gm mp mq mr gp ms mt mu ou mw mx my ov na nb nc ow ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Without TensorFlow: Web app with an Object Detection API in Heroku and OpenCV <a class=\"af ni\" href=\"https:\/\/medium.com\/@rdeep\/without-tensorflow-web-app-with-object-detection-api-in-heroku-and-opencv-aa1e54eceee1\" rel=\"noopener\">[LINK]<\/a><\/p><\/blockquote>\n<p id=\"bd18\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">While writing the above article, I realized there are lots of code examples available online, but I couldn\u2019t find any output analysis using OpenCV DNN for object detection. So I figured, why not explore the OpenCV DNN module?<\/p>\n<p id=\"ca4f\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">So in this tutorial, we\u2019ll be exploring how object detection works with <a class=\"af ni\" href=\"http:\/\/www.ebenezertechs.com\/mobilenet-ssd-using-opencv-3-4-1-deep-learning-module-python\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">OpenCV DNN and MobileNet-SSD <\/a>(in terms of <em class=\"ot\">inference<\/em>).<\/p>\n<p id=\"e716\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">We\u2019ll be using:<\/p>\n<ol class=\"\">\n<li id=\"3a90\" class=\"mn mo fo be b gm mp mq mr gp ms mt mu ou mw mx my ov na nb nc ow ne nf ng nh ox oy oz bj\" data-selectable-paragraph=\"\">Python 3<\/li>\n<li id=\"d070\" class=\"mn mo fo be b gm pa mq mr gp pb mt mu ou pc mx my ov pd nb nc ow pe nf ng nh ox oy oz bj\" data-selectable-paragraph=\"\">OpenCV [Latest version]<\/li>\n<li id=\"3e53\" class=\"mn mo fo be b gm pa mq mr gp pb mt mu ou pc mx my ov pd nb nc ow pe nf ng nh ox oy oz bj\" data-selectable-paragraph=\"\">MobileNet-SSD v2<\/li>\n<\/ol>\n<p id=\"3dd4\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">OpenCV DNN supports models trained from various frameworks like Caffe and TensorFlow. It also supports various networks architectures based on <a class=\"af ni\" href=\"https:\/\/heartbeat.comet.ml\/gentle-guide-on-how-yolo-object-localization-works-with-keras-part-2-65fe59ac12d\" target=\"_blank\" rel=\"noopener ugc nofollow\">YOLO,<\/a> MobileNet-SSD, Inception-SSD, Faster-RCNN Inception,Faster-RCNN ResNet, and Mask-RCNN Inception.<\/p>\n<p id=\"6b33\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Because OpenCV supports multiple platforms (Android, Raspberry Pi) and languages (C++, Python, and Java), we can use this module for development on many different devices.<\/p>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:600\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png\" alt=\"\" width=\"600\" height=\"222\"><\/figure><div class=\"nj nk pf\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/format:webp\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 1200w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1200\/1*Mu7_d3e1qPtW1e7EgsX7LQ.png 1200w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 600px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<h2 id=\"a352\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\">Why OpenCV DNN?<\/h2>\n<p id=\"02ca\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">OpenCV DNN runs faster inference than the <a class=\"af ni\" href=\"https:\/\/github.com\/tensorflow\/models\/tree\/master\/research\/object_detection\" target=\"_blank\" rel=\"noopener ugc nofollow\">TensorFlow object detection API <\/a>with higher speed and low computational power. We will see the performance comparison in a future blog post.<\/p>\n<h2 id=\"8e94\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Why MobileNet-SSD?<\/strong><\/h2>\n<p id=\"57ca\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">MobileNet-SSD can easily be trained with the TensorFlow-Object-Detection-API, Lightweight.<\/p>\n<p id=\"5b0e\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Check out the <a href=\"https:\/\/github.com\/opencv\/opencv\/wiki\/TensorFlow-Object-Detection-API\">official docs<\/a> for more.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h1 id=\"c0c8\" class=\"np nq fo be nr ns rj go nu nv rk gr nx ny rl oa ob oc rm oe of og rn oi oj ok bj\" data-selectable-paragraph=\"\">Getting Started<\/h1>\n<p id=\"f98e\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">First things first, here\u2019s a <a href=\"https:\/\/github.com\/rdeepc\/ExploreOpencvDnn\">GitHub repo<\/a> I created that allows you to explore this module:<\/p>\n<h2 id=\"b05d\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\">Installation<\/h2>\n<p id=\"d63e\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">Installing OpenCV for python<\/p>\n<p id=\"2acc\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\"><code class=\"cw rp rq rr rs b\">pip3 install opencv-python<\/code><\/p>\n<p id=\"8f01\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Download the pre-trained model from the above link.<\/p>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:518\/1*f0rVnnanLORTprTp-HctqA.png\" alt=\"\" width=\"518\" height=\"322\"><\/figure><div class=\"nj nk rt\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1036\/format:webp\/1*f0rVnnanLORTprTp-HctqA.png 1036w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 518px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*f0rVnnanLORTprTp-HctqA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*f0rVnnanLORTprTp-HctqA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*f0rVnnanLORTprTp-HctqA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*f0rVnnanLORTprTp-HctqA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*f0rVnnanLORTprTp-HctqA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*f0rVnnanLORTprTp-HctqA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1036\/1*f0rVnnanLORTprTp-HctqA.png 1036w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 518px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"ae48\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">We\u2019ll be using MobileNet-SSD v2 for our object detection model, as it\u2019s more popular\u2014let\u2019s download its weights and config.<\/p>\n<p id=\"038f\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">From the weights folder (after unzipping), we use the <code class=\"cw rp rq rr rs b\">frozen_inference_graph.pb<\/code> file.<\/p>\n<h2 id=\"9a8e\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\">Project Structure<\/h2>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:408\/1*3KJIqenh9cLtqQ0sphrTGA.png\" alt=\"\" width=\"408\" height=\"131\"><\/figure><div class=\"nj nk ru\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:816\/format:webp\/1*3KJIqenh9cLtqQ0sphrTGA.png 816w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 408px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*3KJIqenh9cLtqQ0sphrTGA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*3KJIqenh9cLtqQ0sphrTGA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*3KJIqenh9cLtqQ0sphrTGA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*3KJIqenh9cLtqQ0sphrTGA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*3KJIqenh9cLtqQ0sphrTGA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*3KJIqenh9cLtqQ0sphrTGA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:816\/1*3KJIqenh9cLtqQ0sphrTGA.png 816w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 408px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"f795\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">(I\u2019m using virtualenv for this tutorial, so there is venv, but this isn\u2019t mandatory.)<\/p>\n<h2 id=\"b8fe\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Lets code<\/strong><\/h2>\n<p id=\"0b8c\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">All of the code can be found in <code class=\"cw rp rq rr rs b\">main.py<\/code><\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"33e3\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">classNames is a dictionary that contains the 90 objects trained in the model and also in the background <\/span><\/pre>\n<h2 id=\"5958\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Loading models (Those we have downloaded)<\/strong><\/h2>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"85ba\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">model = cv2.dnn.readNetFromTensorflow('models\/frozen_inference_graph.pb', 'models\/ssd_mobilenet_v2_coco_2018_03_29.pbtxt')<\/span><\/pre>\n<blockquote class=\"oq or os\"><p id=\"d239\" class=\"mn mo ot be b gm mp mq mr gp ms mt mu ou mw mx my ov na nb nc ow ne nf ng nh fh bj\" data-selectable-paragraph=\"\"><strong class=\"be no\">We need an image to detect objects (these can be captured as frames from live video)<\/strong><\/p><\/blockquote>\n<p id=\"e95a\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">For the purposes of this tutorial, let\u2019s use <a class=\"af ni\" href=\"https:\/\/pxhere.com\/en\/photo\/1031251\" target=\"_blank\" rel=\"noopener ugc nofollow\">this image<\/a>:<\/p>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:183\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg\" alt=\"\" width=\"183\" height=\"276\"><\/figure><div class=\"nj nk sc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:366\/format:webp\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 366w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 183px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:366\/1*IY2k5NvAhTqR_GHTgO0JsQ.jpeg 366w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 183px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"73bf\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Download the image into the code directory; then read the image with OpenCV and show it:<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"78df\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">image = cv2.imread(\"image.jpeg\")\ncv2.imshow('image',image)\ncv2.waitKey(0)\ncv2.destroyAllWindows()<\/span><\/pre>\n<h2 id=\"5369\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Feeding the image to the network<\/strong><\/h2>\n<p id=\"fb4d\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">To feed image into the network, we have to convert the image to a <a class=\"af ni\" href=\"https:\/\/forums.fast.ai\/t\/what-is-blob\/15255\" target=\"_blank\" rel=\"noopener ugc nofollow\">blob<\/a>. A blob is a pre-processed image that serves as the input. <a class=\"af ni\" href=\"https:\/\/heartbeat.comet.ml\/data-preprocessing-and-visualization-implications-for-your-machine-learning-model-8dfbaaa51423\" target=\"_blank\" rel=\"noopener ugc nofollow\">Pre-processing techniques<\/a> like resizing according to the model, color swapping, cropping, and color channels mean subtraction (normalizing color channel values by subtracting a mean value).<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"fd99\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True))<\/span><\/pre>\n<p id=\"439a\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">We&#8217;re resizing the image to 300 x 300 as our pre-trained model supports and swapping the color channels from BGR to RGB. OpenCV reads in BGR, while RGB is commonly used in model training. RGB is more popular.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"ab ca qr qs qt qu\" role=\"separator\"><\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<h2 id=\"8489\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Feeding forward the model<\/strong><\/h2>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"581d\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">output = model.forward()<\/span><\/pre>\n<p id=\"b3f5\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">We got the output, so now we need to understand it. For the input image we have given, the shape of the output matrix is (1, 1, 100, 7), Our main concern is output [0,0,:,:]<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"a1a0\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">output[0,0,:,:].shape is (100, 7)<\/span><\/pre>\n<h2 id=\"6a6f\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Thresholding Results<\/strong><\/h2>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"4989\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">for detection in output[0,0,:,:]:\n    confidence = detection[2]<\/span><\/pre>\n<p id=\"d9a4\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Each detection output gives a predicted confidence in a range of 0 to 1. But most of them are false positive (falsely detected). So we\u2019ll keep only objects of higher confidence. For that, let\u2019s set a threshold of .5. We\u2019ll consider everything up to that threshold when drawing the bounding box around the image.<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"1ca6\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">if confidence &gt; .5:<\/span><\/pre>\n<p id=\"40b4\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">For the predicted class, we get the id. From the id, we\u2019ll get the class name from the <code class=\"cw rp rq rr rs b\">classNames<\/code> dictionary.<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"ced2\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">class_id = detection[1]<\/span><\/pre>\n<h2 id=\"94e1\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Converting id to class name<\/strong><\/h2>\n<p id=\"7362\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">This function below takes <code class=\"cw rp rq rr rs b\">class_id<\/code> and returns the class name according to the id from <code class=\"cw rp rq rr rs b\">classNames<\/code><\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"f4c8\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">def id_class_name(class_id, classes):\n    for key,value in classes.items():\n        if class_id == key:\n            return value<\/span><\/pre>\n<p id=\"b269\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">In this loop, we\u2019re printing Class id, confidence, and class name for debugging purposes before drawing the box.<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"7315\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">for detection in output[0, 0, :, :]:\n    confidence = detection[2]\n    if confidence &gt; .5:\n        class_id = detection[1]\n        print(str(str(class_id) + \" \" + str(detection[2])  + \" \" + id_class_name(class_id,classNames)))<\/span><\/pre>\n<p id=\"7f35\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">This is the output we got from the above code:<\/p>\n<p id=\"0a97\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\"><code class=\"cw rp rq rr rs b\">1.0 0.6377985 person<br>\n18.0 0.84042233 dog<\/code><\/p>\n<h2 id=\"f2fa\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">The Bounding Box<\/strong><\/h2>\n<p id=\"7bbd\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">It&#8217;s time to draw the box in the image. To draw the bounding box in the image for the predicted object, we need <code class=\"cw rp rq rr rs b\">x<\/code>, <code class=\"cw rp rq rr rs b\">y<\/code>, <code class=\"cw rp rq rr rs b\">width<\/code>, and <code class=\"cw rp rq rr rs b\">height<\/code>.<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"bb33\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">box_x=detection[3]\nbox_y=detection[4]\nbox_width=detection[5]\nbox_height=detection[6]<\/span><\/pre>\n<p id=\"70c2\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">But we need to scale the values of the box according to our image height and width. But the image is 3 dimensional, as it also includes color channels, and we\u2019re only taking height and width. So we skip the color channel input with \u201c_\u201d<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"d6ed\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">image_height, image_width, _ = image.shape<\/span><\/pre>\n<h2 id=\"0713\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">After scaling<\/strong><\/h2>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"f85b\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">box_x = detection[3] * image_width\nbox_y = detection[4] * image_height\nbox_width = detection[5] * image_width\nbox_height = detection[6] * image_height<\/span><\/pre>\n<p id=\"4ccc\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Once we\u2019ve scaled the image, we have to draw the box with those new values in the image<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"4005\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">cv2.rectangle(image, (int(box_x), int(box_y)), (int(box_width), int(box_height)), (23, 230, 210), thickness=1)<\/span><\/pre>\n<p id=\"2b41\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">OpenCV\u2019s <code class=\"cw rp rq rr rs b\">rectangle<\/code> function takes arguments for the image input, rectangle x, rectangle y, rectangle width, and rectangle height. It also takes color and thickness.<\/p>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:183\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg\" alt=\"\" width=\"183\" height=\"276\"><\/figure><div class=\"nj nk sc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:366\/format:webp\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 366w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 183px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:366\/1*cwg8PP8v7SpMGPnDxL8jMg.jpeg 366w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 183px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<h2 id=\"7b94\" class=\"pg nq fo be nr ph pi pj nu pk pl pm nx mv pn po pp mz pq pr ps nd pt pu pv pw bj\" data-selectable-paragraph=\"\"><strong class=\"al\">Adding Text<\/strong><\/h2>\n<p id=\"8e23\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">Now that we\u2019ve drawn the bounding boxes, let\u2019s add the class <code class=\"cw rp rq rr rs b\">text<\/code> in the box:<\/p>\n<pre class=\"mg mh mi mj mk rv rs rw rx ax ry bj\"><span id=\"8117\" class=\"pg nq fo rs b ia rz sa l iq sb\" data-selectable-paragraph=\"\">cv2.putText(image,class_name ,(int(box_x), int(box_y+.05*image_height)),cv2.FONT_HERSHEY_SIMPLEX,(.005*image_width),(0, 0, 255))\n<\/span><\/pre>\n<p id=\"e27e\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">OpenCV\u2019s <code class=\"cw rp rq rr rs b\">putText<\/code> function takes arguments for image, text, starting x, starting y, font type, font size, and text color. In the above code snippet, We\u2019ve scaled the <code class=\"cw rp rq rr rs b\">starting y<\/code> and <code class=\"cw rp rq rr rs b\">font size<\/code> according to the image dimensions.<\/p>\n<figure class=\"mg mh mi mj mk mf nj nk paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ml mm c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:183\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg\" alt=\"\" width=\"183\" height=\"276\"><\/figure><div class=\"nj nk sc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:366\/format:webp\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 366w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 183px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:366\/1*VLYFDUP-j-sZB9xphmJxcA.jpeg 366w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 183px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"e81f\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Voil\u00e0! Now we\u2019ve got our desired bounding box in the detected objects, and we\u2019ve added labels to each of them.<\/p>\n<h1 id=\"60ce\" class=\"np nq fo be nr ns nt go nu nv nw gr nx ny nz oa ob oc od oe of og oh oi oj ok bj\" data-selectable-paragraph=\"\">Conclusion<\/h1>\n<p id=\"6537\" class=\"pw-post-body-paragraph mn mo fo be b gm ol mq mr gp om mt mu mv on mx my mz oo nb nc nd op nf ng nh fh bj\" data-selectable-paragraph=\"\">My hope is that this tutorial has provided an understanding of how we can use the OpenCV DNN module for object detection. And with MobileNet-SSD inference, we can use it for any kind of object detection use case or application.<\/p>\n<p id=\"af98\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Now that we have an understanding of the output matrix, we can use the output values according to our application\u2019s need. It gets more fun when you run a custom-trained model\u2014maybe we can see this in a future blog post!<\/p>\n<p id=\"66b9\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\"><strong class=\"be no\">Discuss this post on <\/strong><a class=\"af ni\" href=\"https:\/\/news.ycombinator.com\/item?id=18649232\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be no\">Hacker News<\/strong><\/a><strong class=\"be no\"> and <\/strong><a class=\"af ni\" href=\"https:\/\/www.reddit.com\/r\/MachinesLearn\/comments\/9qpmso\/realtime_object_detection_on_raspberry_pi_using\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be no\">Reddit<\/strong><\/a><strong class=\"be no\">.<\/strong><\/p>\n<p id=\"4cb2\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">Find me:<\/p>\n<p id=\"d3e3\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">GitHub: <a class=\"af ni\" href=\"https:\/\/github.com\/rdeepc\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/github.com\/rdeepc<\/a><\/p>\n<p id=\"6add\" class=\"pw-post-body-paragraph mn mo fo be b gm mp mq mr gp ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh fh bj\" data-selectable-paragraph=\"\">LinkedIn: <a class=\"af ni\" href=\"https:\/\/www.linkedin.com\/in\/saumyashovanroy\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/www.linkedin.com\/in\/saumyashovanroy\/<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Running deep learning models is computationally expensive. And when it comes to image processing with computer vision, the first thing that comes to mind is high-end GPUs\u2014think the 1080ti and now the 2080ti. But it\u2019s hard to run computer vision models on edge devices like Raspberry Pi, and making a portable solution is difficult with [&hellip;]<\/p>\n","protected":false},"author":44,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[152],"class_list":["post-6559","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Real-Time Object Detection on Raspberry Pi Using OpenCV DNN - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN\" \/>\n<meta property=\"og:description\" content=\"Running deep learning models is computationally expensive. And when it comes to image processing with computer vision, the first thing that comes to mind is high-end GPUs\u2014think the 1080ti and now the 2080ti. But it\u2019s hard to run computer vision models on edge devices like Raspberry Pi, and making a portable solution is difficult with [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-29T19:23:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:15:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg\" \/>\n<meta name=\"author\" content=\"Saumya Shovan Roy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Saumya Shovan Roy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/","og_locale":"en_US","og_type":"article","og_title":"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN","og_description":"Running deep learning models is computationally expensive. And when it comes to image processing with computer vision, the first thing that comes to mind is high-end GPUs\u2014think the 1080ti and now the 2080ti. But it\u2019s hard to run computer vision models on edge devices like Raspberry Pi, and making a portable solution is difficult with [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-06-29T19:23:23+00:00","article_modified_time":"2025-04-24T17:15:17+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg","type":"","width":"","height":""}],"author":"Saumya Shovan Roy","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Saumya Shovan Roy","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/"},"author":{"name":"Saumya Shovan Roy","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/d3d1005b4f1f4bceea8f5e63f2ce8933"},"headline":"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN","datePublished":"2023-06-29T19:23:23+00:00","dateModified":"2025-04-24T17:15:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/"},"wordCount":1092,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/","url":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/","name":"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg","datePublished":"2023-06-29T19:23:23+00:00","dateModified":"2025-04-24T17:15:17+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:5616\/1*LyjNlbtMaXBWGEFzh2wArQ.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/real-time-object-detection-on-raspberry-pi-using-opencv-dnn\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Real-Time Object Detection on Raspberry Pi Using OpenCV DNN"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/d3d1005b4f1f4bceea8f5e63f2ce8933","name":"Saumya Shovan Roy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/c447c437b478ee2126ae1e85cb61360b","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1539064602993-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/1539064602993-96x96.jpg","caption":"Saumya Shovan Roy"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/saumyashovanroy\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/6559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/44"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=6559"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/6559\/revisions"}],"predecessor-version":[{"id":15608,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/6559\/revisions\/15608"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=6559"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=6559"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=6559"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=6559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}