Bonsai: diverse and shallow trees for extreme multi-label classification
- PDF / 1,544,764 Bytes
- 21 Pages / 439.37 x 666.142 pts Page_size
- 81 Downloads / 228 Views
Bonsai: diverse and shallow trees for extreme multi‑label classification Sujay Khandagale1 · Han Xiao1 · Rohit Babbar1 Received: 13 July 2019 / Revised: 1 November 2019 / Accepted: 4 June 2020 © The Author(s) 2020
Abstract Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds—fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https://github.com/xmc-aalto/bonsai. Keywords Large-scale multi-label classification · Extreme multi-label classification · Large label space
1 Introduction Extreme Multi-label Classification (XMC) refers to supervised learning of a classifier which can automatically label an instance with a small subset of relevant labels from an extremely large set of all possible target labels. Machine learning problems consisting of hundreds of thousand labels are common in various domains such as product categorization for e-commerce (McAuley and Leskovec 2013; Shen et al. 2011; Bengio et al. 2010; Agrawal et al. 2013), hash-tag suggestion in social media (Denton et al. Editors: Larisa Soldatova, Joaquin Vanschoren. * Rohit Babbar [email protected] 1
Aalto University, Helsinki, Finland
13
Vol.:(0123456789)
Machine Learning
Fig. 1 Label frequency in dataset WikiLSHTC-325K shows power-law distribution. X-axis shows the label IDs sorted by their frequency in training instances and Y-axis gives the actual frequency (on log-scale). Note that more than half of the labels have fewer than 5 training instances
2015), annotating web-scale encyclopedia (Partalas et al. 2015), and image-classification (Krizhevsky et al. 2012; Deng et al. 2010). It has been demonstrated that, the framework of XMC can also be leveraged to effectively address ranking problems arising in bid-phrase suggestion in web-advertising and suggestion of relevant items for recommendation systems (Prabhu and Varma 2014). From the machine learning perspective, building eff
Data Loading...