Embedding Decision Trees and Random Forests in Constraint Programming

In past papers, we have introduced Empirical Model Learning (EML) as a method to enable Combinatorial Optimization on real world systems that are impervious to classical modeling approaches. The core idea in EML consists in embedding a Machine Learning mo

  • PDF / 292,061 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 89 Downloads / 289 Views

DOWNLOAD

REPORT


Abstract. In past papers, we have introduced Empirical Model Learning (EML) as a method to enable Combinatorial Optimization on real world systems that are impervious to classical modeling approaches. The core idea in EML consists in embedding a Machine Learning model in a traditional combinatorial model. So far, the method has been demonstrated by using Neural Networks and Constraint Programming (CP). In this paper we add one more technique to the EML arsenal, by devising methods to embed Decision Trees (DTs) in CP. In particular, we propose three approaches: 1) a simple encoding based on meta-constraints; 2) a method using attribute discretization and a global table constraint; 3) an approach based on converting a DT into a Multi-valued Decision Diagram, which is then fed to an mdd constraint. We finally show how to embed in CP a Random Forest, a powerful type of ensemble classifier based on DTs. The proposed methods are compared in an experimental evaluation, highlighting their strengths and their weaknesses.

1

Introduction

Combinatorial Optimization methods have been successfully applied to a broad range of industrial problems. Many of such approaches rely on the availability of some declarative model describing decisions, constraints on these decisions, their cost and their impact on the considered system. In short, they rely on an accurate problem model. However, in some application domains, the model is either not fully known, or it is described in a way that is not useful for combinatorial optimization. As an example, for many domains there are predictive models to forecast the temporal dynamic of a target system via differential equations, but those are unfortunately impossible to insert into a combinatorial optimization model without incurring in computational issues. In these cases, it is likely that the domain expert proposes some heuristic knowledge on the problem that is a (non measurable) approximation of the effect that some decisions have on the system dynamic. We propose here an alternative approach, which is an instantiation of a general method called Empirical Model Learning that we proposed in [2]. We aim at learning part of the combinatorial optimization model and to embed the learned piece of knowledge in the combinatorial model itself. In this way, we have two advantages: the first is that we c Springer International Publishing Switzerland 2015  L. Michel (Ed.): CPAIOR 2015, LNCS 9075, pp. 74–90, 2015. DOI: 10.1007/978-3-319-18008-3 6

Embedding Decision Trees and Random Forests in Constraint Programming

75

can use this knowledge to reduce the search tree and the second is that we know the accuracy that we have obtained in the process. In this paper, we consider the problem of embedding in a CP model two types of tree-based classifiers from Machine Learning, namely Decision Trees and Random Forests. Formally, a classifier is a function f mapping a tuple of values for a set of discrete or numeric attributes to a discrete class. We can embed a classifier in CP by introducing a vector of variables