A deep learning approach to building an intelligent video surveillance system

  • PDF / 2,170,559 Bytes
  • 21 Pages / 439.642 x 666.49 pts Page_size
  • 81 Downloads / 220 Views

DOWNLOAD

REPORT


A deep learning approach to building an intelligent video surveillance system Jie Xu1 Received: 15 May 2020 / Revised: 15 September 2020 / Accepted: 22 September 2020 / © The Author(s) 2020

Abstract Recent advances in the field of object detection and face recognition have made it possible to develop practical video surveillance systems with embedded object detection and face recognition functionalities that are accurate and fast enough for commercial uses. In this paper, we compare some of the latest approaches to object detection and face recognition and provide reasons why they may or may not be amongst the best to be used in video surveillance applications in terms of both accuracy and speed. It is discovered that Faster R-CNN with Inception ResNet V2 is able to achieve some of the best accuracies while maintaining real-time rates. Single Shot Detector (SSD) with MobileNet, on the other hand, is incredibly fast and still accurate enough for most applications. As for face recognition, FaceNet with Multi-task Cascaded Convolutional Networks (MTCNN) achieves higher accuracy than advances such as DeepFace and DeepID2+ while being faster. An end-to-end video surveillance system is also proposed which could be used as a starting point for more complex systems. Various experiments have also been attempted on trained models with observations explained in detail. We finish by discussing video object detection and video salient object detection approaches which could potentially be used as future improvements to the proposed system. Keywords Deep learning · Video surveillance · Machine learning · Object detection · Face recognition

1 Introduction In the past few decades, surveillance cameras, also known as Closed-circuit television (CCTV), have had a rapid growth in number around the world. Take Great Britain as an example. In England and Wales, the number of surveillance cameras rose from 100 in 1990 to about 4.2 million in 2007, which means there are over 7 surveillance cameras for every

 Jie Xu

[email protected] 1

Department of Computer Science, The University of Manchester, Manchester, UK

Multimedia Tools and Applications

100 people [8]. With such a huge number of surveillance cameras deployed, it would require enormous effort for outputs from all these cameras to be monitored by human. Therefore, in this paper, a deep learning approach to automatically process images captured by surveillance cameras is presented, which focuses on automated object detection and face recognition. The aim is to explore a feasible way to integrate both object detection and face recognition methods into commercial video surveillance systems by evaluating and experimenting state-of-the-art algorithms. We have chosen to use deep-learning methods for both object detection and face recognition tasks for a number of reasons. First of all, deep-learning methods are easier to deploy and possess better scalability than conventional machine-learning methods thanks to their ability to process data in their raw form [21]. W