Scandent Tree: A Random Forest Learning Method for Incomplete Multimodal Datasets

We propose a solution for training random forests on incomplete multimodal datasets where many of the samples are non-randomly missing a large portion of the most discriminative features. For this goal, we present the novel concept of scandent trees. Thes

PDF / 177,126 Bytes
8 Pages / 439.363 x 666.131 pts Page_size
62 Downloads / 190 Views

DOWNLOAD

REPORT

University of British Columbia, Vancouver, British Columbia, Canada 2 IBM Almaden Research Center, San Jose, CA, USA [email protected]

Abstract. We propose a solution for training random forests on incomplete multimodal datasets where many of the samples are non-randomly missing a large portion of the most discriminative features. For this goal, we present the novel concept of scandent trees. These are trees trained on the features common to all samples that mimic the feature space division structure of a support decision tree trained on all features. We use the forest resulting from ensembling these trees as a classiﬁcation model. We evaluate the performance of our method for diﬀerent multimodal sample sizes and single modal feature set sizes using a publicly available clinical dataset of heart disease patients and a prostate cancer dataset with MRI and gene expression modalities. The results show that the area under ROC curve of the proposed method is less sensitive to the multimodal dataset sample size, and that it outperforms the imputation methods especially when the ratio of multimodal data to all available data is small.

1

Introduction

In recent years there has been an interest in multimodality data analysis for disease detection. Ideally, multimodality methods should leverage the strengths of each modality and compensate for weaknesses. Another advantage of multimodality data analysis is discovering novel relations between diﬀerent modalities. One example is ﬁnding the connection between genes related to Alzheimer’s disease and related areas in functional MRI [1]. Acquiring multimodal data is, in general, more costly and time consuming than a single modality. As a result, multimodal datasets usually have valuable features, but small sample sizes. This makes it diﬃcult to build classiﬁers, with large training data, for highly multimodal protocols. Multomodal data is also often high dimensional and pose diﬃculties in feature selection and classiﬁer building. Ensemble classiﬁers such as random forest provide a solution for the large feature space in small datasets using feature bagging. To tackle the issue of incomplete datasets, a variety of data imputation techniques exist. Some of these are non-parametric methods like hot deck imputation, KNN imputation or mean substitution. These methods ignore the possible correlations in data and could add bias. Model-based methods, on the other hand,

Corresponding author.

c Springer International Publishing Switzerland 2015 N. Navab et al. (Eds.): MICCAI 2015, Part I, LNCS 9349, pp. 694–701, 2015. DOI: 10.1007/978-3-319-24553-9_85

Scandent Tree: A Random Forest Learning Method

695

assume a certain structure to the missing samples, like missing completely at random (MCAR) or missing not at random (MNAR). Examples of these methods include multiple imputation [2], maximum likelihood, stochastic regression [3], expectation maximization [3] and Bayesian methods [4]. While these methods could result in reduced bias, the assumption of speciﬁc pattern in the miss

Data Loading...

Scandent Tree: A Random Forest Learning Method for Incomplete Multimodal Datasets

Recommend Documents

Random Forest and Concept of Decision Tree Model

Double random forest

An Integrated Object and Machine Learning Approach for Tree Canopy Extraction from UAV Datasets

Tailoring Random Forest for Requirements Classification

Bayesian Network Structure Learning with Messy Inputs: The Case of Multiple Incomplete Datasets and Expert Opinions

Flat random forest: a new ensemble learning method towards better training efficiency and adaptive model size to deep fo

CHIRPS: Explaining random forest classification

Unsupervised Learning on Document Datasets

Fruited-Forest: A Reachability Querying Method Based on Spanning Tree Modelling of Reduced DAG

Deep Representation Learning for Multimodal Brain Networks

K-means tree: an optimal clustering tree for unsupervised learning

Random Forest Learning Based Indoor Localization as an IoT Service for Smart Buildings