Multilabel graph-based classification for missing labels

PDF / 680,428 Bytes
20 Pages / 595.276 x 790.866 pts Page_size
90 Downloads / 313 Views

Multilabel graph-based classification for missing labels Yasunobu Sumikawa1 · Tatsurou Miyazaki2 Received: 5 March 2019 / Revised: 17 August 2020 / Accepted: 23 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Assigning several labels to digital data is becoming easier as this can be achieved in a collaborative manner with Internet users. However, this process is still a challenge, especially in cases where several labels are assigned to each datum, as some suitable labels may be missed. The missing labels lead to inaccuracies in classification. In this study, we propose a novel graph-based multi-label classifier that exhibits stability for obtaining high-accuracy results; this is achieved even where there are missing labels in training data. The core process of our algorithm is to smoothen the label values of the training data from their top-k similar data by propagating their values and averaging them to generate values for the missing labels in the training data. In experimental evaluations, we used multi-labeled document and image datasets to evaluate classifiers, and then measured micro-averaged F-scores for eight classifiers. Even though we incrementally removed correct labels from the two datasets, the proposed algorithm tended to maintain the F-scores, whereas other classifiers decreased the scores. In addition, we evaluated the algorithm using Wikipedia, which comprises a real dataset that includes missing labels, in order to determine how well the algorithm predicted the correct labels and how useful it was for manual annotations, as initial decisions. We have confirmed that LPAC is useful for not only automatic annotation, but also the facilitation of decision making in the initial manual category assignment. Keywords Multi-label classification · Label propagation · Digital document classification · Digital image classification

1 Introduction Thanks to the growing size of the Web and digital archiving technology, we can now access numerous digital documents, images, and other types of data. This situation is good for enhancing our experiences of using the Web; for example, it is easy to study the history of any country, to find big pictures about relationships between people, and so on. On the other hand, it is becoming increasingly demanding to organize digital data to access them quickly. Defining categories and dividing digital data into these categories play key roles in digital data organization. For example, the categorization of digital documents is useful for constructing thematic timelines or event lists.

B

Yasunobu Sumikawa [email protected] Tatsurou Miyazaki [email protected]

1

University Education Center, Tokyo Metropolitan University, Hachioji, Tokyo, Japan

2

Department of Information Sciences, Tokyo University of Science, Noda, Chiba, Japan

As the amount of data increases, the categorization schemes dynamically change due to the revision of the hierarchical structure and the definition of new categories. When these categorization schemes chan

Data Loading...

Multilabel graph-based classification for missing labels

Recommend Documents

Multilabel Classification

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

Missing data techniques in classification for cardiovascular dysautonomias diagnosis

Weakly supervised multilabel classification for semantic interpretation of endoscopy video frames

Beyond missing: weakly-supervised multi-label learning with incomplete and noisy labels

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

Deep siamese network for limited labels classification in source camera identification

Analysis by Multiclass Multilabel Classification of the 2015 #SmearForSmear Campaign Using Deep Learning

Selective ensemble of uncertain extreme learning machine for pattern classification with missing features

Environmental Labels and Declarations

Anomaly detection with inexact labels

On Aggregation in Ensembles of Multilabel Classifiers