Constrained nonnegative matrix factorization-based semi-supervised multilabel learning

PDF / 1,456,082 Bytes
8 Pages / 595.276 x 790.866 pts Page_size
20 Downloads / 364 Views

ORIGINAL ARTICLE

Constrained nonnegative matrix factorization‑based semi‑supervised multilabel learning Dingguo Yu1 · Bin Fu2 · Guandong Xu2 · Aihong Qin1 Received: 24 April 2017 / Accepted: 9 January 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract In many multilabel learning applications, instances with labels being fully provided are scarce, while partially labelled data and unlabelled data are more common due to the expensive cost of manual labelling. However, most of existing models are based on the assumption that the fully labelled training data is sufficient. To deal with the partially labelled and unlabelled data effectively, we present a novel semi-supervised multilabel learning approach based on constrained non-negative matrix factorization in this paper. This approach assumes that if two instances are highly similar in terms of their features, they would also be similar in their associated labels set. Specifically, We first define three matrices to measure the similarity of each pair of instances in two different ways. Then, the optimal assignation of labels to the unlabelled instance is determined by minimizing the differentiation between these two similarity sets via a non-negative matrix factorization process. We also present a threshold learning algorithm to determine the classification threshold for each label in our proposed approach. Extensive experiment is conducted on various datasets, and the results demonstrate that our method show significantly better performance than other state-of-the-art approaches. It is especially suitable for the situations with a smaller size of labelled training data, or subset of the training data are partially labelled. Keywords Semi-supervised learning · Nonnegative matrix factorization (NMF) · Multilabel learning · Weak label

1 Introduction In traditional machine learning, an instance is usually assumed to have only one class label. However, in real applications, an instance usually consists of multiple concepts simultaneously. For instance, in text categorization, a news report could cover several topics; while in scene classification, an image could be related to some scenarios, to name a few. Accordingly, learning models that could predict multiple labels simultaneously for an instance is called multilabel learning [22]. In multilabel learning, each instance is also represented by a vector of features as in traditional singlelabel classification, while associated with multiple labels instead of a single label [30]. Nowadays, there have been extensive applications of multilabel learning, such as text * Dingguo Yu yudg@zjicm.edu.cn 1

School of New Media, Zhejiang University of Media and Communications, Hangzhou 310018, China

Advanced Analytics Institute, University of Technology, Sydney, NSW 2007, Australia

2

categorization [20], gene function analysis [5] and image or video annotation [17], et al. Over the last decade, a variety of multilabel learning models have been proposed, such as Binary Relevance [3], Classifier Cha

Data Loading...

Constrained nonnegative matrix factorization-based semi-supervised multilabel learning

Recommend Documents

Nonparametric Bayesian Nonnegative Matrix Factorization

Bayesian mean-parameterized nonnegative binary matrix factorization

Nonnegative Residual Matrix Factorization for Community Detection

Randomized Algorithms for Orthogonal Nonnegative Matrix Factorization

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

Multilabel Classification

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Nonnegative matrix factorization with manifold regularization and maximum discriminant information

Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

FER based on the improved convex nonnegative matrix factorization feature

Nonnegative Solution

Type-Constrained Representation Learning in Knowledge Graphs