Using error decay prediction to overcome practical issues of deep active learning for named entity recognition

  • PDF / 2,447,224 Bytes
  • 30 Pages / 439.37 x 666.142 pts Page_size
  • 1 Downloads / 186 Views

DOWNLOAD

REPORT


Using error decay prediction to overcome practical issues of deep active learning for named entity recognition Haw‑Shiuan Chang1,2   · Shankar Vembu2 · Sunil Mohan2 · Rheeya Uppaal1 · Andrew McCallum1 Received: 15 December 2019 / Revised: 16 June 2020 / Accepted: 11 July 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Existing deep active learning algorithms achieve impressive sampling efficiency on natural language processing tasks. However, they exhibit several weaknesses in practice, including (a) inability to use uncertainty sampling with black-box models, (b) lack of robustness to labeling noise, and (c) lack of transparency. In response, we propose a transparent batch active sampling framework by estimating the error decay curves of multiple feature-defined subsets of the data. Experiments on four named entity recognition (NER) tasks demonstrate that the proposed methods significantly outperform diversification-based methods for black-box NER taggers, and can make the sampling process more robust to labeling noise when combined with uncertainty-based methods. Furthermore, the analysis of experimental results sheds light on the weaknesses of different active sampling strategies, and when traditional uncertainty-based or diversification-based methods can be expected to work well. Keywords  Active learning · Transparency · Robustness to labeling noise · Black-box models · Clustering · Named entity recognition

Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Andrew McCallum [email protected] Haw‑Shiuan Chang [email protected] Shankar Vembu [email protected] Sunil Mohan [email protected] Rheeya Uppaal [email protected] 1

University of Massachusetts Amherst, College of Information and Computer Science, Amherst, MA, USA

2

Chan Zuckerberg Initiative (CZI), Redwood City, CA, USA



13

Vol.:(0123456789)



Machine Learning

1 Introduction Deep neural networks achieve state-of-the-art results on many tasks, especially when a large amount of training data is available. Their success highlights the importance of reducing the cost of collecting labels on a large scale. Active learning can be used to select data samples that will most benefit a predictor’s training, thereby reducing the amount of labeled data needed without hurting the predictor’s accuracy. The effectiveness of uncertainty and disagreement-based1 active learning methods have been demonstrated on several datasets for shallow predictors (Settles and Craven 2008; Settles 2009), and more recently also for deep learning predictors  (Gal et  al. 2017; Shen et  al. 2018; Siddhant and Lipton 2018). Nevertheless, random sampling is still the most popular method to build new datasets in several domains, including natural language processing   (Tomanek and Olsson 2009). This is due to the practical issues of deploying uncertainty-based active sampling (Settles 2011; Lowell et al. 2019), including its limited applicability, robustness, a