Multidimensional Prediction Models When the Resolution Context Changes

Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may

PDF / 1,046,488 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
96 Downloads / 216 Views

DOWNLOAD

REPORT

act. Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may also be interested in predicting sales for all vegetables (higher up in the product hierarchy) for next Friday (lower down in the time dimension). While the domain and data are the same, the operating context is diﬀerent. We explore several approaches for multidimensional data when predictions have to be made at diﬀerent levels (or contexts) of aggregation. One method relies on the same resolution, another approach aggregates predictions bottom-up, a third approach disaggregates predictions top-down and a ﬁnal technique corrects predictions using the relation between levels. We show how these strategies behave when the resolution context changes, using several machine learning techniques in four application domains. Keywords: Multidimensional data · Operating context aggregation Disaggregation · OLAP cubes · Quantiﬁcation

1

·

Introduction

Most existing algorithms in machine learning only manipulate data at an individual level (ﬂat data tables), not considering the case of multiple abstract levels for the given data set. However, in many applications, data contains structured information that is multidimensional (or multilevel) in nature, such as retailing, geographic, economic or scientiﬁc data. The multidimensional model is a widely extended conceptual model originated in the database literature that can be used to properly capture the multiresolutional character of many data sets [1,5,13,26]. Multidimensional databases arrange data into fact tables and dimensions. A fact table includes instances of facts at the lowest possible level. Each row represents a fact, such as “The sales of product ‘Tomato soup 500ml’ in store ‘123’ on day ‘20/06/2014’ totalled 25 units”. The features (or ﬁelds) of a fact table are either measures (indicators such as units, euros, volumes, etc.) or references to dimensions. A dimension is here understood as a particular variable that has predeﬁned (and hopefully meaningful) levels of aggregation, with a hierarchical structure. c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 509–524, 2015. DOI: 10.1007/978-3-319-23525-7 31

510

A. Mart´ınez-Us´ o and J. Hern´ andez-Orallo

Figure 1 shows several examples of dimensions and hierarchies. Using the hierarchies, the data can be aggregated or disaggregated at diﬀerent granularities. Each of this set of aggregation choices for all dimensions is known as a data cube [6], which provides an easy understanding and oﬀers ﬂexibility for visualisation (aggregated tables and cubes). OLAP technology, for instance, has been developed to handle large volumes of multidimensional data in a highly eﬃcient way, and moving through the space of cubes by the use of roll-up, drill-down, slice&dice and pivoting operators.

Fig. 1. Exa

Data Loading...

Multidimensional Prediction Models When the Resolution Context Changes

Recommend Documents

Multidimensional Models

Multidimensional Sensor Data Prediction

Multidimensional Models of Spray Processes

PGD for solving multidimensional and parametric models

Spatial Econometric Models, Prediction

Innovative Method to Build Robust Prediction Models When Gold-Standard Outcomes Are Scarce

Adolescent Dissent and Conflict Resolution in the Indian Context

Business Models in the Bottom of the Pyramid Context

Multidimensional QoE Prediction of WebRTC Video Communication with Machine Learning

When Does Self-Criticism Lead to Depression in Collectivistic Context

Context-Aware Ranking with Factorization Models

Conflict Resolution Models and Resource Minimization Problems