A deep multimodal generative and fusion framework for class-imbalanced multimodal data

PDF / 3,954,292 Bytes
28 Pages / 439.642 x 666.49 pts Page_size
14 Downloads / 353 Views

A deep multimodal generative and fusion framework for class-imbalanced multimodal data Qing Li1 · Guanyuan Yu1 · Jun Wang1 · Yuehao Liu1 Received: 27 May 2019 / Revised: 12 June 2020 / Accepted: 15 June 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The purpose of multimodal classification is to integrate features from diverse information sources to make decisions. The interactions between different modalities are crucial to this task. However, common strategies in previous studies have been to either concatenate features from various sources into a single compound vector or input them separately into several different classifiers that are then assembled into a single robust classifier to generate the final prediction. Both of these approaches weaken or even ignore the interactions among different feature modalities. In addition, in the case of class-imbalanced data, multimodal classification becomes troublesome. In this study, we propose a deep multimodal generative and fusion framework for multimodal classification with class-imbalanced data. This framework consists of two modules: a deep multimodal generative adversarial network (DMGAN) and a deep multimodal hybrid fusion network (DMHFN). The DMGAN is used to handle the class imbalance problem. The DMHFN identifies fine-grained interactions and integrates different information sources for multimodal classification. Experiments on a faculty homepage dataset show the superiority of our framework compared to several start-of-the-art methods. Keywords Multimodal classification · Class-imbalanced data · Deep multimodal generative adversarial network · Deep multimodal hybrid fusion network

1 Introduction Multimodal data consist of several feature modalities, where each modality is represented by a group of similar data sharing the same attributes. The aim of multimodal classification is to process and integrate information from multiple modalities to make decisions. In the era of big data, many applications of interest involve multimodal classification problems, including audio-visual speech recognition (AVSR) [40], affective computing [39], human emotion recognition [32], medical image analysis [22], user profiling [13], and stock Guanyuan Yu

[email protected] 1

Fintech Innovation Center and School of Economic Information Engineering, Southwestern University of Finance and Economics, Chendu, China

Multimedia Tools and Applications

movement prediction [29]. However, two challenging problems usually arise when fusing information from multiple interactive modalities for multimodal classification. The first major challenge is multimodal representation. The heterogeneity in the statistical properties of multimodal data makes it more difficult to learn a joint representation using information from multiple sources [3, 17, 24]. A good example is the joint processing of images (which are real-valued and dense) and texts (which are discrete and sparse), which typically have different dimensions and structures [52]. In

Data Loading...

A deep multimodal generative and fusion framework for class-imbalanced multimodal data

Recommend Documents

Multimodal Fusion

A Dynamic Deep Neural Network for Multimodal Clinical Data Analysis

Multimodal Data

Intelligent multimodal medical image fusion with deep guided filtering

Gate-Fusion Transformer for Multimodal Sentiment Analysis

Deep Representation Learning for Multimodal Brain Networks

Digital Phenotyping Using Multimodal Data

Deep fusion of multimodal features for social media retweet time prediction

A Deep-Generative Hybrid Model to Integrate Multimodal and Dynamic Connectivity for Predicting Spectrum-Level Deficits i

Deep learning-based late fusion of multimodal information for emotion classification of music video

A Path to Multimodal Data Services for Telecommunications

Isabl Platform, a digital biobank for processing multimodal patient data