HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning

Hierarchical Classification (HC) is an important problem with a wide range of application in domains such as music genre classification, protein function classification and document classification. Although several innovative classification methods have b

PDF / 243,354 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
65 Downloads / 216 Views

DOWNLOAD

REPORT

Abstract. Hierarchical Classiﬁcation (HC) is an important problem with a wide range of application in domains such as music genre classiﬁcation, protein function classiﬁcation and document classiﬁcation. Although several innovative classiﬁcation methods have been proposed to address HC, most of them are not scalable to web-scale problems. While simple methods such as top-down “pachinko” style classiﬁcation and ﬂat classiﬁcation scale well, they either have poor classiﬁcation performance or do not eﬀectively use the hierarchical information. Current methods that incorporate hierarchical information in a principled manner are often computationally expensive and unable to scale to large datasets. In the current work, we adopt a cost-sensitive classiﬁcation approach to the hierarchical classiﬁcation problem by deﬁning misclassiﬁcation cost based on the hierarchy. This approach eﬀectively decouples the models for various classes, allowing us to eﬃciently train eﬀective models for large hierarchies in a distributed fashion.

1

Introduction

Categorizing entities according to a hierarchy of general to speciﬁc classes is a common practice in many disciplines. It can be seen as an important aspect of various ﬁelds such as bioinformatics, music genre classiﬁcation, image classiﬁcation and more importantly document classiﬁcation [18]. Often the data is curated manually, but with exploding sizes of databases, it is becoming increasingly important to develop automated methods for hierarchical classiﬁcation of entities. Several classiﬁcation methods have been developed over the past several years to address the problem of Hierarchical Classiﬁcation (HC). One straightforward approach is to simply use multi-class or binary classiﬁers to model the relevant classes and disregard the hierarchical information. This methodology has been called flat classiﬁcation scheme in HC literature [18]. While ﬂat classiﬁcation can be competitive, an important research directions is to improve the classiﬁcation performance by incorporating the hierarchical structure of the classes in the learning algorithm. Another simple data decomposition approach trains local classiﬁers for each of the classes deﬁned according to the hierarchy, such that the trained model can be used in a top-down fashion to take the most relevant path in testing. This top-down approach trains each classiﬁer on a smaller dataset and c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 675–690, 2015. DOI: 10.1007/978-3-319-23528-8 42

676

A. Charuvaka and H. Rangwala

is quite eﬃcient in comparison to ﬂat classiﬁcation, which generally train one-vsrest classiﬁers on the entire dataset. However, a severe drawback of this approach is that if a prediction error is committed at a higher level, then the classiﬁer selects a wrong prediction path, making it impossible to recover from the errors at lower levels. Due to this error propagation, sometimes, sever degradation in performance has been noted for the top-down cl

Data Loading...

HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning

Recommend Documents

Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study

Active learning for hierarchical multi-label classification

Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

Large-Scale Hierarchical Self-Assembly Structures from Gold Nanoparticles

Cost-sensitive Dictionary Learning for Software Defect Prediction

Energy Efficiency in Large Scale Distributed Systems COST IC0804 Eur

Efficient Large Scale Image Classification via Prediction Score Decomposition

Inductive Inference for Large Scale Text Classification Kernel Appro

Hierarchical Large-scale Volume Representation with \(\root 3 \of 2 \) Subdivision and Trivariate B-spline Wavelets

Competitive Cost Sharing with Economies of Scale

Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning

A Large-Scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks with Deep Neural Networks