Improving deep learning performance with missing values via deletion and compensation

PDF / 706,158 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
97 Downloads / 253 Views

(0123456789().,-volV)(0123456789(). ,- volV)

IWINAC 2015

Improving deep learning performance with missing values via deletion and compensation Adria´n Sa´nchez-Morales1 • Jose´-Luis Sancho-Go´mez1 Anı´bal R. Figueiras-Vidal2

•

Juan-Antonio Martı´nez-Garcı´a1

•

Received: 27 March 2018 / Accepted: 8 January 2019 Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract Missing values in a dataset is one of the most common difficulties in real applications. Many different techniques based on machine learning have been proposed in the literature to face this problem. In this work, the great representation capability of the stacked denoising auto-encoders is used to obtain a new method of imputating missing values based on two ideas: deletion and compensation. This method improves imputation performance by artificially deleting values in the input features and using them as targets in the training process. Nevertheless, although the deletion of samples is demonstrated to be really efficient, it may cause an imbalance between the distributions of the training and the test sets. In order to solve this issue, a compensation mechanism is proposed based on a slight modification of the error function to be optimized. Experiments over several datasets show that the deletion and compensation not only involve improvements in imputation but also in classification in comparison with other classical techniques. Keywords Missing values Imputation Classification Deep learning

1 Introduction In recent years, data processing is being an extensively exploited field. Great efforts are being carried out in order to develop mechanisms to obtain useful information from data, a task which is getting harder due to the amount of information which is daily produced. Through literature,

& Adria´n Sa´nchez-Morales [email protected] Jose´-Luis Sancho-Go´mez [email protected] Juan-Antonio Martı´nez-Garcı´a [email protected] Anı´bal R. Figueiras-Vidal [email protected] 1

Departamento de Tecnologı´as de la Informacio´n y las Comunicaciones, Universidad Polite´cnica de Cartagena, Plaza del Hospital, 1. Edificio Cuartel de Antigones, 30202 Cartagena, Murcia, Spain

2

Departamento de Teorı´a de la Sen˜al y Comunicaciones, Universidad Carlos III de Madrid, Avda Universidad, 30, 28911 Legane´s, Madrid, Spain

researchers have shown the wide range of drawbacks which can appear when handling real datasets. One of the most studied is the presence of missing values that arise in almost every real-world application [1–3]. Different effects of incomplete datasets in classification performance appear in accordance with the way of dealing with missing values. A first approach is to train classifiers only with complete samples. In this case, useful information is discarded and the classification of new incomplete instances is not possible. There are also embedded procedures which can directly deal with unknown input values without imputation, such as decision trees [4] and fuzzy neural networks [5, 6]. However, th

Data Loading...

Improving deep learning performance with missing values via deletion and compensation

Recommend Documents

Missing Values

Missing Values

Improving Retinal Vessels Segmentation via Deep Learning in Salient Region

Clustering and Regression to Impute Missing Values of Robot Performance

Fehlende Datenwerte/Missing Values

Deep Learning and Missing Data in Engineering Systems

Learning Surrogates via Deep Embedding

Improving System Performance Via Reevaluation of Models

Finding Missing Child in Shopping Mall Using Deep Learning

Deep Learning Performance for Triage and Diagnosis

An Incremental Algorithm for Repairing Training Sets with Missing Values

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations