Support vector regression for polyhedral and missing data

PDF / 1,090,784 Bytes
24 Pages / 439.37 x 666.142 pts Page_size
54 Downloads / 203 Views

Support vector regression for polyhedral and missing data Gianluca Gazzola1,2

· Myong K. Jeong1,3

Accepted: 10 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We introduce “Polyhedral Support Vector Regression” (PSVR), a regression model for data represented by arbitrary convex polyhedral sets. PSVR is derived as a generalization of support vector regression, in which the data is represented by individual points along input variables X 1 , X 2 , . . ., X p and output variable Y , and extends a support vector classification model previously introduced for polyhedral data. PSVR is in essence a robust-optimization model, which defines prediction error as the largest deviation, calculated along Y , between an interpolating hyperplane and all points within a convex polyhedron; the model relies on the affine Farkas’ lemma to make this definition computationally tractable within the formulation. As an application, we consider the problem of regression with missing data, where we use convex polyhedra to model the multivariate uncertainty involving the unobserved values in a data set. For this purpose, we discuss a novel technique that builds on multiple imputation and principal component analysis to estimate convex polyhedra from missing data, and on a geometric characterization of such polyhedra to define observation-specific hyper-parameters in the PSVR model. We show that an appropriate calibration of such hyper-parameters can have a significantly beneficial impact on the model’s performance. Experiments on both synthetic and real-world data illustrate how PSVR performs competitively or better than other benchmark methods, especially on data sets with high degree of missingness. Keywords Regression · Uncertainty · Missing data · Convex polyhedron · Farkas’ lemma

1 Introduction Support vector regression (SVR) is a supervised learning method for the estimation of an unknown function from a data set of observations, each of which represented by a point with

B

Myong K. Jeong [email protected] Gianluca Gazzola [email protected]

1

Rutgers Center for Operations Research, Department of Management Science and Information Systems, Rutgers University, 100 Rockafeller Road, Piscataway, NJ 08854, USA

2

Bridge Intelligence LLC, 1215 Livingston Ave Suite 208, North Brunswick, NJ 08902, USA

3

Department of Industrial and Systems Engineering, Rutgers University, 96 Frelinghuysen Road, Piscataway, NJ 08854, USA

123

Annals of Operations Research

multiple input values and one output value (Hastie et al. 2009). Such estimation is carried out via a hyperplane, which is optimally fit on the data set, possibly after a non-linear transformation of the input values by means of a kernel function (Vapnik 1995; Drucker et al. 1997; Smola and Scholkopf 2004). Its remarkable performance as a predictive model has gained SVR considerable popularity in a variety of fields of application, including finance (Yang et al. 2002), transportation (Wu et al. 2004), genetics (Myasnikova et al.

Data Loading...

Support vector regression for polyhedral and missing data

Recommend Documents

Automatic optimized support vector regression for financial data prediction

Square Penalty Support Vector Regression

Least squares support vector regression for solving Volterra integral equations

Direction of Arrival Estimation Based on Support Vector Regression

Extension III: Variational Margin Settings within Local Data in Support Vector Regression

Hybrid Variable Selection and Support Vector Regression for Gas Sensor Optimization

Ontology Based Recommendation System for Predicting Cultivation and Harvesting Timings Using Support Vector Regression

Missing Data

Missing Data

Predicting the DJIA with News Headlines and Historic Data Using Hybrid Genetic Algorithm/Support Vector Regression and B

Sparse auto encoder driven support vector regression based deep learning model for predicting network intrusions

A Novel Fusion Framework for Salient Object Detection Based on Support Vector Regression