{"id":5935,"date":"2023-06-14T08:04:53","date_gmt":"2023-06-14T16:04:53","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=5935"},"modified":"2025-04-24T17:15:27","modified_gmt":"2025-04-24T17:15:27","slug":"detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/","title":{"rendered":"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI"},"content":{"rendered":"\n<link rel=\"\u201ccanonical\u201d\" href=\"\u201chttps:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\u201d\">\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"5520\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Object detection is a branch of <a class=\"af mq\" href=\"https:\/\/heartbeat.comet.ml\/the-5-computer-vision-techniques-that-will-change-how-you-see-the-world-1ee19334354b\" target=\"_blank\" rel=\"noopener ugc nofollow\">computer vision<\/a>, in which visually observable objects that are in images of videos can be detected, localized, and recognized by computers. An image is a single frame that captures a single-static instance of a naturally occurring event . On the other hand, a video contains many instances of static images displayed in one second, inducing the effect of viewing a naturally occurring event.<\/p>\n<figure class=\"mu mv mw mx my mz mr ms paragraph-image\">\n<div class=\"na nb eb nc bg nd\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg ne nf c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg\" alt=\"\" width=\"961\" height=\"541\"><\/figure><div class=\"mr ms mt\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"a9f7\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Technically, a single static image in a video is called a <strong class=\"be ng\">video frame<\/strong>. In most videos, the number of frames in one second of the video ranges between 20 to 32, and this value is called the <strong class=\"be ng\">frames-per-second <\/strong>(fps).<\/p>\n<p id=\"34a2\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Detecting objects in images and videos accurately has been highly successful in the second decade of the 21st century due to the rise of machine learning and deep learning algorithms. <a class=\"af mq\" href=\"https:\/\/towardsdatascience.com\/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e\" target=\"_blank\" rel=\"noopener\">Specialized algorithms<\/a> have been developed that can detect, locate, and recognize objects in images and videos, some of which include <mark class=\"ve vf ao\">RCNNs, SSD, RetinaNet, <\/mark><mark class=\"ve vf ao\"><a class=\"af mq\" href=\"https:\/\/heartbeat.fritz.ai\/gentle-guide-on-how-yolo-object-localization-works-with-keras-part-2-65fe59ac12d\" target=\"_blank\" rel=\"noopener ugc nofollow\">YOLO<\/a><\/mark><mark class=\"ve vf ao\">,<\/mark><mark class=\"ve vf ao\"> <\/mark><mark class=\"ve vf ao\">and others.<\/mark><\/p>\n<p id=\"42fa\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Using these algorithms to detect and recognize objects in videos requires an understanding of applied mathematics and solid technical knowledge of the algorithms as well as thousands of lines of code. This is a highly technical and time-consuming process, and for those who desire to implement object detection can find the process very inconvenient.<\/p>\n<p id=\"d3f2\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">In this article, we we\u2019ll be using a Python library called <a class=\"af mq\" href=\"https:\/\/github.com\/OlafenwaMoses\/ImageAI\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">ImageAI<\/a> that has made it possible for anyone with basic knowledge of Python to build applications and systems that can detect objects in videos using only a few lines of programming code. ImageAI supports <a class=\"af mq\" href=\"https:\/\/towardsdatascience.com\/yolo-v3-object-detection-53fb7d3bfe6b\" target=\"_blank\" rel=\"noopener\">YOLOv3<\/a>, which is the object detection algorithm we\u2019ll use in this article.<\/p>\n<p id=\"2692\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">To get started, you will install a number of Python libraries and ImageAI<strong class=\"be ng\">. <\/strong>If you have any of the dependencies mentioned below already installed on your computer, you can jump straight to the installation of ImageAI<strong class=\"be ng\">. <\/strong>Also ensure the Python version you have installed on your computer is <a class=\"af mq\" href=\"https:\/\/www.python.org\/downloads\/release\/python-376\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be ng\">Python 3.7.6<\/strong><\/a>.<\/p>\n<p id=\"c506\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><code class=\"cw nh ni nj nk b\">pip install<\/code> the following dependencies:<\/p>\n<p id=\"4030\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">i. TensorFlow<\/p>\n<pre class=\"mu mv mw mx my nl nk nm nn ax no bj\"><span id=\"1f1b\" class=\"np nq fo nk b ho nr ns l ie nt\" data-selectable-paragraph=\"\"><strong class=\"nk fp\">pip3 install tensorflow==2.4.0<\/strong><\/span><\/pre>\n<p id=\"da0c\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">ii. Other Dependencies<\/p>\n<pre class=\"mu mv mw mx my nl nk nm nn ax no bj\"><span id=\"2efb\" class=\"np nq fo nk b ho nr ns l ie nt\" data-selectable-paragraph=\"\"><strong class=\"nk fp\">pip install keras==2.4.3 numpy==1.19.3 pillow==7.0.0 scipy==1.4.1 h5py==2.10.0 matplotlib==3.3.2 opencv-python keras-resnet==0.2.0<\/strong><\/span><\/pre>\n<p id=\"62f6\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">iii. ImageAI<\/p>\n<p id=\"973b\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">pip install imageai --upgrade<\/strong><\/code><\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"5eb9\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Now that we\u2019ve installed the tools you need, we\u2019ll be using a trained YOLOv3 computer vision model to perform the detection and recognition tasks. We can download the model file via this <a class=\"af mq\" href=\"https:\/\/github.com\/OlafenwaMoses\/ImageAI\/releases\/download\/1.0\/yolo.h5\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be ng\">link<\/strong><\/a>. This model is trained to detect and recognize 80 different objects, named below:<\/p>\n<pre class=\"mu mv mw mx my nl nk nm nn ax no bj\"><span id=\"2ed7\" class=\"np nq fo nk b ho nr ns l ie nt\" data-selectable-paragraph=\"\">person, bicycle, car, motorcycle, airplane,\nbus, train, truck, boat, traffic light, fire hydrant, stop_sign,\nparking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra,\ngiraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard,\nsports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket,\nbottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange,\nbroccoli, carrot, hot dog, pizza, donot, cake, chair, couch, potted plant, bed,\ndining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave,\noven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer,\ntoothbrush.<\/span><\/pre>\n<p id=\"c108\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">For the purpose of this article, we\u2019ve provided a sample video that you can download and use to write code to detect and recognize objects in the video. Download the video via this <a class=\"af mq\" href=\"https:\/\/github.com\/OlafenwaMoses\/IntelliP\/raw\/master\/traffic-mini.mp4\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be ng\">link<\/strong><\/a>.<\/p>\n<p id=\"8115\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Next, we create a Python file and give it a name, e.g <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">FirstVideoDetection.py<\/strong><\/code>. Then we copy both the downloaded video and YOLOv3 model file into the folder where our <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">FirstVideoDetection.py<\/strong><\/code> file is. Once you have done that, we write the exact code below into our <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">FirstVideoDetection.py<\/strong><\/code> file.<\/p>\n<pre>from imageai.Detection import VideoObjectDetection\nimport os\n\nexecution_path = os.getcwd()\n\ndetector = VideoObjectDetection()\ndetector.setModelTypeAsYOLOv3()\ndetector.setModelPath( os.path.join(execution_path , \"yolo.h5\"))\ndetector.loadModel()\n\nvideo_path = detector.detectObjectsFromVideo(input_file_path=os.path.join( execution_path, \"traffic-mini.mp4\"),\n                                output_file_path=os.path.join(execution_path, \"traffic_mini_detected_1\")\n                                , frames_per_second=29, log_progress=True)\nprint(video_path)<\/pre>\n<p id=\"10da\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Before we run our Python code, here\u2019s an in-depth explanation of the preceding code:<\/p>\n<p id=\"2b2f\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">1) In the fourth line, we created an instance of the <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">VideoObjectDetection<\/strong><\/code>class.<\/p>\n<p id=\"5f0e\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">2) In the fifth line, we set the model type to YOLOv3, which corresponds to the YOLO model we downloaded and copied to the folder.<\/p>\n<p id=\"08c3\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">3) In the sixth line, we set the model path to the file path of the model file we copied into the folder.<\/p>\n<p id=\"f248\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">4) In the seventh line, we loaded the model into the instance of the <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">VideoObjectDetection<\/strong><\/code> class that we created.<\/p>\n<p id=\"32fd\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">5) In the eight line, we called the <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">detectObjectsFromVideo<\/strong><\/code> function and parsed the following values into it:<\/p>\n<p id=\"4c4b\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">i. <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">input_file_path<\/strong><\/code><strong class=\"be ng\">: <\/strong>This refers to the file path of the video we copied into the folder.<\/p>\n<p id=\"ac81\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">ii. <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">output_file_path<\/strong><\/code><strong class=\"be ng\">:<\/strong> This refers to the file path to which the detected video will be saved.<\/p>\n<p id=\"4cec\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">iii. <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">frames_per_second<\/strong><\/code><strong class=\"be ng\">:<\/strong> This refers to the number of image frames that we want the detected video to have within a second.<\/p>\n<p id=\"39c0\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">iv. <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">log_progress<\/strong><\/code><strong class=\"be ng\">:<\/strong> This is used to state that the detection instance should report the progress of the detection in the command line interface.<\/p>\n<p id=\"405d\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">6) The <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">detectObjectsFromVideo<\/strong><\/code> function will return the file path of the detected video. This file path will be printed in the ninth line of code once the detection task is done.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fh fi fj fk fl\">\n<div class=\"ab ca\">\n<div class=\"ch bg et eu ev ew\">\n<p id=\"bcd6\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Now that we understand the content of our code, we can now run it and watch the progress in the command line interface until it\u2019s done.<\/p>\n<p id=\"a2a7\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Note: If you have a computer system with an NVIDIA GPU and you installed the GPU version of TensorFlow, this detection process should be done in less than a minute. Otherwise, it may take a few minutes. The detection will progress for each frame of the video detected, and the detected video that\u2019s saved will be automatically updated for each frame detected.<\/p>\n<p id=\"517b\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Once the detection is done, we\u2019ll find the detected video in the folder that contains our Python file. When we open and play the video, it will contain the original video, but in this case, with boxes that locate various objects, the names of those objects, and the probability of the recognition as a percentage. See the results of this example in the YouTube video below:<\/p>\n<figure class=\"mu mv mw mx my mz\">\n<div class=\"oc ig l eb\">\n<div class=\"op oe l\"><iframe loading=\"lazy\" class=\"ek n fc dx bg\" title=\"Detecting Objects in Video with YOLOv3 Using ImageAI\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fh2sOwSo8UcI%3Ffeature%3Doembed&amp;url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dh2sOwSo8UcI&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fh2sOwSo8UcI%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube\" width=\"854\" height=\"480\" frameborder=\"0\" scrolling=\"no\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe><\/div>\n<\/div>\n<\/figure>\n<p id=\"1b35\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">As of now, we\u2019ve applied the trained YOLOv3 model to detect objects in a video file. We can use this code to detect and recognize objects in any other video file, apart from the one provided in this article.<\/p>\n<p id=\"d9ff\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Next, we\u2019ll look at how to detect and recognize objects in camera feeds. The camera can be the default one pre-installed on our computer system, a camera connected by cable, or an IP Camera. To do this, we create another Python file in the folder where the YOLOv3 model is and give it a name, e.g <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">FirstCameraDetection.py<\/strong><\/code>.<\/p>\n<p id=\"7d09\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">Once we\u2019ve created the Python file, we copy the code below and write it into our new Python file.<\/p>\n<pre>from imageai.Detection import VideoObjectDetection\nimport os\nimport cv2\n\nexecution_path = os.getcwd()\n\ncamera = cv2.VideoCapture(0)\n\ndetector = VideoObjectDetection()\ndetector.setModelTypeAsYOLOv3()\ndetector.setModelPath(os.path.join(execution_path , \"yolo.h5\"))\ndetector.loadModel()\n\nvideo_path = detector.detectObjectsFromVideo(camera_input=camera,\n                                output_file_path=os.path.join(execution_path, \"camera_detected_1\")\n                                , frames_per_second=29, log_progress=True)\nprint(video_path)<\/pre>\n<p id=\"2c73\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">The code above is very similar to our previous code. The differences are as stated below:<\/p>\n<p id=\"d5ca\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">1) We created an instance of OpenCV\u2019s <code class=\"cw nh ni nj nk b\">VideoCapture<\/code> class and loaded the computer\u2019s camera into it.<\/p>\n<p id=\"c5ad\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">3) Then in the <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">detectObjectsFromVideo<\/strong><\/code> function, we stated the <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">camera_input<\/strong><\/code> instead and parsed the name of the camera instance we created, unlike the <code class=\"cw nh ni nj nk b\"><strong class=\"be ng\">input_file_path<\/strong><\/code> we used in the previous detection code.<\/p>\n<p id=\"2815\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">These are the differences in detecting and recognizing objects in video files versus from camera feeds. Now, we can run the code and watch the progress in the command line interface.<\/p>\n<p id=\"002a\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><strong class=\"be ng\">Please note <\/strong>that, in this case, to stop the detection progress we\u2019d have to do it manually, by halting the running of the Python code, because our system camera will keep recording videos as long as the code is running. Once the progress in the command line interface reaches <strong class=\"be ng\">150 frames<\/strong>, I suggest that we halt the Python code. Once we do this, we go to the folder of our Python file and find the detected video feed from our system\u2019s camera.<\/p>\n<p id=\"6852\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">If we\u2019re planning to detect and recognize objects from the feeds of an <strong class=\"be ng\">IP Camera<\/strong>, all we need is to obtain the address of the <strong class=\"be ng\">IP Camera <\/strong>and load it with OpenCV, as seen in the example below:<\/p>\n<pre>camera = cv2.VideoCapture(\"http:\/\/192.168.43.1:8080\/video\")<\/pre>\n<h2 id=\"79d0\" class=\"np nq fo be oq or os ot ou ov ow ox oy md oz pa pb mh pc pd pe ml pf pg ph pi bj\" data-selectable-paragraph=\"\">Conclusion<\/h2>\n<p id=\"7173\" class=\"pw-post-body-paragraph lt lu fo be b lv pj lx ly lz pk mb mc md pl mf mg mh pm mj mk ml pn mn mo mp fh bj\" data-selectable-paragraph=\"\">In this post, we\u2019ve learned how to detect objects in video files and camera feeds with few lines of code using ImageAI. Beyond image recognition and object detection in images and videos, ImageAI supports advanced video analysis with interval callbacks and functions to train image recognition models on custom datasets. Learn more by visiting the <a href=\"https:\/\/github.com\/OlafenwaMoses\/ImageAI\">link<\/a> to the ImageAI repository.<\/p>\n<p id=\"ff81\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\">If you enjoyed and gained from reading this article, give it a clap. Also feel free to share it to friends and colleagues. You can reach to me if you have any questions or suggestions via my contacts below.<\/p>\n<p id=\"29bd\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><strong class=\"be ng\">Twitter : <\/strong><a class=\"af mq\" href=\"https:\/\/twitter.com\/OlafenwMoses\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/twitter.com\/OlafenwaMoses<\/a><\/p>\n<p id=\"4d8d\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><strong class=\"be ng\">Facebook: <\/strong><a class=\"af mq\" href=\"https:\/\/www.facebook.com\/moses.olafenwa\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/www.facebook.com\/moses.olafenwa<\/a><\/p>\n<p id=\"236a\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><strong class=\"be ng\">Email: <\/strong><a class=\"af mq\" href=\"mailto:guymodscientist@gmail.com\" target=\"_blank\" rel=\"noopener ugc nofollow\">guymodscientist@gmail.com<\/a><\/p>\n<p id=\"a663\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><strong class=\"be ng\">Website: <\/strong><a class=\"af mq\" href=\"https:\/\/moses.specpal.science\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/moses.aicommons.science<\/a><\/p>\n<p id=\"f309\" class=\"pw-post-body-paragraph lt lu fo be b lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp fh bj\" data-selectable-paragraph=\"\"><strong class=\"be ng\">Discuss this post on <\/strong><a class=\"af mq\" href=\"https:\/\/news.ycombinator.com\/item?id=17750405\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be ng\">Hacker News<\/strong><\/a><strong class=\"be ng\"> and <\/strong><a class=\"af mq\" href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/96ytq0\/a_simple_object_detection_tutorial_using_keras\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"be ng\">Reddit<\/strong><\/a><strong class=\"be ng\">.<\/strong><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Object detection is a branch of computer vision, in which visually observable objects that are in images of videos can be detected, localized, and recognized by computers. An image is a single frame that captures a single-static instance of a naturally occurring event . On the other hand, a video contains many instances of static [&hellip;]<\/p>\n","protected":false},"author":34,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[147],"class_list":["post-5935","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI\" \/>\n<meta property=\"og:description\" content=\"Object detection is a branch of computer vision, in which visually observable objects that are in images of videos can be detected, localized, and recognized by computers. An image is a single frame that captures a single-static instance of a naturally occurring event . On the other hand, a video contains many instances of static [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-14T16:04:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:15:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg\" \/>\n<meta name=\"author\" content=\"Moses Olafenwa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Moses Olafenwa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/","og_locale":"en_US","og_type":"article","og_title":"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI","og_description":"Object detection is a branch of computer vision, in which visually observable objects that are in images of videos can be detected, localized, and recognized by computers. An image is a single frame that captures a single-static instance of a naturally occurring event . On the other hand, a video contains many instances of static [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-06-14T16:04:53+00:00","article_modified_time":"2025-04-24T17:15:27+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg","type":"","width":"","height":""}],"author":"Moses Olafenwa","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Moses Olafenwa","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/"},"author":{"name":"Moses Olafenwa","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/a292c1a664d0921357b4da8b16f89053"},"headline":"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI","datePublished":"2023-06-14T16:04:53+00:00","dateModified":"2025-04-24T17:15:27+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/"},"wordCount":1294,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/","url":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/","name":"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg","datePublished":"2023-06-14T16:04:53+00:00","dateModified":"2025-04-24T17:15:27+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*-Q7qmHQV8JWKp3uSZTXfJA.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/detecting-objects-in-videos-and-camera-feeds-using-keras-opencv-and-imageai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/a292c1a664d0921357b4da8b16f89053","name":"Moses Olafenwa","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/7f3ddd6dd5ed0caf604e4538b3195625","url":"https:\/\/secure.gravatar.com\/avatar\/188c04d802893431521b5e87d378e88c13e0619c3272f4971d08bbf99da79bfa?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/188c04d802893431521b5e87d378e88c13e0619c3272f4971d08bbf99da79bfa?s=96&d=mm&r=g","caption":"Moses Olafenwa"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/moses-olafenwa\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5935","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/34"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=5935"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5935\/revisions"}],"predecessor-version":[{"id":15617,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/5935\/revisions\/15617"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=5935"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=5935"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=5935"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=5935"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}