First-person activity recognition from micro-action representations using convolutional neural networks and object flow
- PDF / 2,392,572 Bytes
- 21 Pages / 439.642 x 666.49 pts Page_size
- 68 Downloads / 195 Views
First-person activity recognition from micro-action representations using convolutional neural networks and object flow histograms Panagiotis Giannakeris1 · Panagiotis C. Petrantonakis1 · Konstantinos Avgerinakis1 · Stefanos Vrochidis1 · Ioannis Kompatsiaris1 Received: 20 March 2019 / Revised: 16 August 2020 / Accepted: 16 September 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract A novel first-person human activity recognition framework is proposed in this work. Our proposed methodology is inspired by the central role moving objects have in egocentric activity videos. Using a Deep Convolutional Neural Network we detect objects and develop discriminant object flow histograms in order to represent fine-grained micro-actions during short temporal windows. Our framework is based on the assumption that large scale activities are synthesized by fine-grained micro-actions. We gather all the micro-actions and perform Gaussian Mixture Model clusterization, so as to build a micro-action vocabulary that is later used in a Fisher encoding schema. Results show that our method can reach 60% recognition rate on the benchmark ADL dataset. The capabilities of the proposed framework are also showcased by profoundly evaluating for a great deal of hyper-parameters and comparing to other State-of-the-Art works. Keywords Activity recognition · Object detection · Egocentric vision · Ambient assisted living
Panagiotis Giannakeris
[email protected] Panagiotis C. Petrantonaki [email protected] Konstantinos Avgerinakis [email protected] Stefanos Vrochidis [email protected] Ioannis Kompatsiaris [email protected] 1
ITI-CERTH, Thermi, Greece
Multimedia Tools and Applications
1 Introduction The continuous rise of the video format as a medium for communication has brought a digital video revolution to the modern connected world. It is safe to say that is has now surpassed the popularity of image and text formats judging by the countless online multimedia platforms that support it and the amount of video clips the web pages are filled with daily. The use cases are endless: from do-it-yourself tutorials, to marketing and live event broadcasting that are uploaded online, many popular public video repositories contain massive amounts of video content. It is not only the attractive combination of auditory and visual content that is making the medium popular, but also the technology of modern wearables that push seemingly every single person to carry a tiny video camera at all times, plus the convenient ways that exist for the videos to end up posted online for immediate consumption on social media. In most of the videos uploaded online, humans are the center of attention and the thematic content is in one way or another moving around the activities that they perform. Multimedia processing and computer vision researchers have shown much interest in the exploitation of those huge databases. The proposed solutions can address the needs of several real life applications, such as video surveillance and security application
Data Loading...