Intelligent User Assistance for Automated Data Mining Method Selection

  • PDF / 2,152,651 Bytes
  • 21 Pages / 595.276 x 790.866 pts Page_size
  • 8 Downloads / 256 Views

DOWNLOAD

REPORT


RESEARCH PAPER

Intelligent User Assistance for Automated Data Mining Method Selection Patrick Zschech • Richard Horn • Daniel Ho¨schele • Christian Janiesch • Kai Heinrich

Received: 17 July 2019 / Accepted: 17 February 2020  The Author(s) 2020

Abstract In any data science and analytics project, the task of mapping a domain-specific problem to an adequate set of data mining methods by experts of the field is a crucial step. However, these experts are not always available and data mining novices may be required to perform the task. While there are several research efforts for automated method selection as a means of support, only a few approaches consider the particularities of problems expressed in the natural and domain-specific language of the novice. The study proposes the design of an intelligent assistance system that takes problem descriptions articulated in natural language as an input and offers advice regarding the most suitable class of data mining methods. Following a design science research approach, the paper (i) outlines the problem setting with an exemplary scenario from industrial practice, (ii) derives design requirements, (iii) develops

design principles and proposes design features, (iv) develops and implements the IT artifact using several methods such as embeddings, keyword extractions, topic models, and text classifiers, (v) demonstrates and evaluates the implemented prototype based on different classification pipelines, and (vi) discusses the results’ practical and theoretical contributions. The best performing classification pipelines show high accuracies when applied to validation data and are capable of creating a suitable mapping that exceeds the performance of joint novice assessments and simpler means of text mining. The research provides a promising foundation for further enhancements, either as a stand-alone intelligent assistance system or as an add-on to already existing data science and analytics platforms. Keywords Intelligent user assistance system  Automated method selection  Data science  Natural language processing  Design science research

Accepted after three revisions by the editors of the Special Issue.

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s12599-020-00642-3) contains supplementary material, which is available to authorized users. P. Zschech (&)  R. Horn  D. Ho¨schele  C. Janiesch  K. Heinrich Lehrstuhl fu¨r Wirtschaftsinformatik, Business Intelligence Research, TU Dresden, 01062 Dresden, Germany e-mail: [email protected] R. Horn e-mail: [email protected] D. Ho¨schele e-mail: [email protected] C. Janiesch e-mail: [email protected] K. Heinrich e-mail: [email protected]

1 Introduction Data science and analytics (DSA) projects are generally multidisciplinary and therefore require combined expertise from several areas, such as profound domain knowledge, analytical modeling skills, and experience in collecting and processing data from