Predictive spreadsheet autocompletion with constraints

  • PDF / 731,491 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 104 Downloads / 174 Views

DOWNLOAD

REPORT


Predictive spreadsheet autocompletion with constraints Samuel Kolb1 · Stefano Teso1 · Anton Dries1 · Luc De Raedt1 Received: 26 November 2018 / Revised: 22 May 2019 / Accepted: 6 September 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Abstract Spreadsheets are arguably the most accessible data-analysis tool and are used by millions of people. Despite the fact that they lie at the core of most business practices, working with spreadsheets can be error prone, usage of formulas requires training and, crucially, spreadsheet users do not have access to state-of-the-art analysis techniques offered by machine learning. To tackle these issues, we introduce the novel task of predictive spreadsheet autocompletion, where the goal is to automatically predict the missing entries in the spreadsheets. This task is highly non-trivial: cells can hold heterogeneous data types and there might be unobserved relationships between their values, such as constraints or probabilistic dependencies. Critically, the exact prediction task itself is not given. We consider a simplified, yet non-trivial, setting and propose a principled probabilistic model to solve it. Our approach combines black-box predictive models specialized for different predictive tasks (e.g., classification, regression) and constraints and formulas detected by a constraint learner, and produces a maximally likely prediction for all target cells that is consistent with the constraints. Overall, our approach brings us one step closer to allowing end users to leverage machine learning in their workflows without writing a single line of code. Keywords Spreadsheets autocompletion · Bayesian networks · Constraint learning · Machine learning

Editors: Karsten Borgwardt, Po-Ling Loh, Evimaria Terzi, Antti Ukkonen. This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. [694980] SYNTH: Synthesising Inductive Data Models).

B

Stefano Teso [email protected] Samuel Kolb [email protected] Anton Dries [email protected] Luc De Raedt [email protected]

1

KU Leuven, Leuven, Belgium

123

Machine Learning

1 Introduction Spreadsheets are the workhorse of business and industry. They support a huge user base, composed of end users with widely different goals and degrees of competence (Lawson et al. 2009). Managing to automate the workflow of these users, even partially, will have a significant impact on all sectors of business. This explains the recent outburst of research and applications of artificial intelligence, machine learning and inductive programming on spreadsheets (Gulwani 2011; Gulwani et al. 2015; Kolb et al. 2017; Devlin et al. 2017). For instance, the BigML (https://bigml.com) extension for Google Sheets integrates standard learning algorithms and workflows into spreadsheet interfaces with the goal of lowering the threshold to predictive analysis for laymen. T