Deep learning based multimodal complex human activity recognition using wearable devices

  • PDF / 927,181 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 64 Downloads / 183 Views

DOWNLOAD

REPORT


Deep learning based multimodal complex human activity recognition using wearable devices Ling Chen 1

&

Xiaoze Liu 1 & Liangying Peng 1 & Menghan Wu 1

Accepted: 5 October 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Wearable device based human activity recognition, as an important field of ubiquitous and mobile computing, is drawing more and more attention. Compared with simple human activity (SHA) recognition, complex human activity (CHA) recognition faces more challenges, e.g., various modalities of input and long sequential information. In this paper, we propose a deep learning model named DEBONAIR (Deep lEarning Based multimodal cOmplex humaN Activity Recognition) to address these problems, which is an end-to-end model extracting features systematically. We design specific sub-network architectures for different sensor data and merge the outputs of all sub-networks to extract fusion features. Then, a LSTM network is utilized to learn the sequential information of CHAs. We evaluate the model on two multimodal CHA datasets. The experiment results show that DEBONAIR is significantly better than the state-of-the-art CHA recognition models. Keywords Complex human activity recognition . Multimodality . Deep learning

1 Introduction Wearable device based human activity recognition is one of the core research problems of ubiquitous and mobile computing. Human activities can be divided into simple human activities (SHAs) and complex human activities (CHAs). A SHA can be represented as a single repeated action and can be easily recognized by using a single accelerometer. Typical SHAs include “walking”, “sitting”, and “standing”. CHAs are not as repetitive as SHAs and usually involve multiple simultaneous or overlapping actions, which can only be well recognized with multimodal sensor data. Common CHAs include “commuting”, “eating”, and “house cleaning”. Related

* Ling Chen [email protected] Xiaoze Liu [email protected] Liangying Peng [email protected] Menghan Wu [email protected] 1

College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

researches mainly focus on SHAs, which usually describe users’ body actions or postures and can be recognized with high accuracy [1–4]. With the growing requirements of many applications (e.g., healthcare systems [5] and smart home [6]), recognizing CHAs begins to attract the attention of the research field. Existing researches of CHA recognition can be divided into three categories. The first, ignores the differences between CHAs and SHAs, and uses SHA recognition methods to recognize CHAs [7, 8]. The second, represents each CHA by a combination of SHAs, where the SHAs are predefined and labeled manually [9–14]. The last, represents CHAs by latent semantics implied in sensor data, and the latent semantics are discovered by topic models [15–18]. However, they have the following limitations. For the first category, since CHAs are far more complicated than SHAs, the features extracted for SHAs are not representat