Fitting Aggregation Functions to Data: Part II - Idempotization

The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number

  • PDF / 258,740 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 27 Downloads / 207 Views

DOWNLOAD

REPORT


Faculty of Mathematics and Information Science, Warsaw University of Technology, ul. Koszykowa 75, 00-662 Warsaw, Poland [email protected] 2 School of Information Technology, Deakin University, 221 Burwood Hwy, Burwood, VIC 3125, Australia {gleb,sjames}@deakin.edu.au 3 Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warsaw, Poland [email protected]

Abstract. The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the second part of this two-part contribution we deal with a quite common situation in which we have inputs coming from different sources, describing a similar phenomenon, but which have not been properly normalized. In such a case, idempotent and nondecreasing functions cannot be used to aggregate them unless proper preprocessing is performed. The proposed idempotization method, based on the notion of B-splines, allows for an automatic calibration of independent variables. The introduced technique is applied in an R source code plagiarism detection system. Keywords: Aggregation functions · Weighted quasi-arithmetic means · Least squares fitting · Idempotence

1

Introduction

Idempotent aggregation functions – mappings like F : [0, 1]n → [0, 1] being nondecreasing in each variable and fulfilling F(x, . . . , x) = x for all x ∈ [0, 1] – have numerous applications, including areas like decision making, pattern recognition, and data analysis, compare, e.g., [8,11]. n nFor a fixed n ≥ 2, let w ∈ [0, 1] be a weighting vector, i.e., one with i=1 wi = 1. In the first unit [1] of this two-part contribution we dealt with two important practical issues concerning supervised learning of weights of weighted c Springer International Publishing Switzerland 2016  J.P. Carvalho et al. (Eds.): IPMU 2016, Part II, CCIS 611, pp. 780–789, 2016. DOI: 10.1007/978-3-319-40581-0 63

Fitting Aggregation Functions to Data: Idempotization

781

quasi-arithmetic means with a known continuous and strictly monotone gener¯ that is idempotent aggregation functions given for arbitrary ator ϕ : [0, 1] → R, x ∈ [0, 1]n by the formula:   n  −1 WQAMeanϕ,w (x) = ϕ wi ϕ(xi ) . i=1

First of all, we observed that most often researchers considered an approximate version of weight learning tasks and relied on a linearization of input variables, compare, e.g., [7]. Therefore, we discussed possible implementations of the exact fitting procedure and identified some cases where linearization leads to solutions of significantly worse quality in terms of the squared error between the desired and generated outputs. Secondly, we noted that the computed models may overfit a training data set and perform weakly on test and validation samples. Thus, some regularization methods were proposed to overcome th