Resource management for model learning at entity level

  • PDF / 1,885,286 Bytes
  • 13 Pages / 595.224 x 790.955 pts Page_size
  • 45 Downloads / 219 Views

DOWNLOAD

REPORT


Resource management for model learning at entity level 1 · Vincent Toulouse1 · Hafez Kader Omar1 · ¨ Christian Beyer1 · Vishnu Unnikrishnan1 · Robert Bruggemann 2 1 Eirini Ntoutsi · Myra Spiliopoulou

Received: 7 November 2019 / Accepted: 13 August 2020 © The Author(s) 2020

Abstract Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model. Keywords Entity-centric learning · Stream classification · Document prediction · Memory reduction · Text ignorant models

1 Introduction Recent developments in hardware have made it easier to consider going beyond traditional machine learning methods, and challenge the concept of “more data is better.” Especially in the case of data streams and time series, it can be the case that the streaming data is generated by some identifiable “entity.” While a machine learning model trained on the entire stream may generalize and perform well, there are cases where learning the idiosyncratic properties of the exact entity that generated a data point would help the model to make better predictions. This would be particularly true in fields like healthcare, The first-author position is shared between the first two authors Christian Beyer and Vishnu Unnikrishnan.  Christian Beyer

[email protected]

Extended author information available on the last page of the article.

where each patient needs predictions and recommendations tailo