Feature-Based and Adaptive Rule Adaptation in Dynamic Environments

  • PDF / 2,423,839 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 46 Downloads / 229 Views

DOWNLOAD

REPORT


Feature‑Based and Adaptive Rule Adaptation in Dynamic Environments Alireza Tabebordbar1 · Amin Beheshti1 · Boualem Benatallah2 · Moshe Chai Barukh2 Received: 10 March 2020 / Revised: 4 May 2020 / Accepted: 8 June 2020 © The Author(s) 2020

Abstract Rule-based systems have been used increasingly to augment learning algorithms for annotating data. Rules alleviate many of the shortcomings inherent in pure algorithmic approaches, in cases algorithms are not working well or lack from enough training data. However, in dynamic curation environments where data are constantly changing, there is a need to craft and adapt rules to keep them applicable and precise. Rule adaptation has been proven to be painstakingly difficult and error-prone, as an analyst is needed for examining the precision of rules and applying different modifications to adapt the imprecise ones. In this paper, we present an autonomic and conceptual approach to adapt data annotation rules. Our approach offloads analysts from adapting rules; it boosts rules to annotate a larger number of items using a set of high-level conceptual features, e.g. topic. We utilize a Bayesian multi-armed-bandit algorithm, an online learning algorithm that adapts rules based on the feedback collects from the curation environment over time. We propose a summarization technique, which offers a set of high-level conceptual features for annotating items by identifying the semantical relationships among them. We conduct experiments on different curation domains and compare the performance of our approach with systems relying on analysts for adapting rules. The experimental results show that our approach has a comparative performance to analysts in adapting rules. Keywords  Rule adaptation · Data annotation rules · Rule-based systems · Data curation systems

1 Introduction Data curation indicates processes and activities related to the integration, annotation, publication, and presentation of data throughout its lifecycle [8]. One category of data curation is data annotation, which aims at labelling the raw data to generate value and increase productivity. Data annotation has been used extensively in various computational machine learning algorithms for information extraction, item classification, record-linkage [6, 38, 39]. However, in dynamic environments, e.g. Twitter (twitter.com/) and Facebook (facebook.com/), where data are continuously changing, relying on pure algorithmic approaches do not scale to the need of businesses that need to annotate data over an * Alireza Tabebordbar [email protected] Amin Beheshti [email protected] 1



Macquarie University, Sydney, Australia



University of New South Wales (UNSW), Sydney, Australia

2

extended period of time. Because algorithms make predictions based on the historical data only. While, in dynamic environments, the distribution of data is changing, and algorithms need to be updated to capture the changes, which is expensive and time-consuming. In recent years, several pioneering solutions (e.g. [5, 23, 30, 33,