Convex Factorization Machines

Factorization machines are a generic framework which allows to mimic many factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfor

  • PDF / 584,092 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 1 Downloads / 197 Views

DOWNLOAD

REPORT


Abstract. Factorization machines are a generic framework which allows to mimic many factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfortunately, factorization machines involve a non-convex optimization problem and are thus subject to bad local minima. In this paper, we propose a convex formulation of factorization machines based on the nuclear norm. Our formulation imposes fewer restrictions on the learned model and is thus more general than the original formulation. To solve the corresponding optimization problem, we present an efficient globally-convergent two-block coordinate descent algorithm. Empirically, we demonstrate that our approach achieves comparable or better predictive accuracy than the original factorization machines on 4 recommendation tasks and scales to datasets with 10 million samples. Keywords: Factorization machines mender systems · Nuclear norm

1

·

Feature interactions

·

Recom-

Introduction

Factorization machines [12] [13] are a generic framework which allows to mimic many factorization models simply by feature engineering. Similarly to linear models, factorization machines learn a feature weight vector w ∈ Rd , where d is the number of features. However, factorization machines also learn a pairwise feature interaction weight matrix Z ∈ Rd×d . Given a feature vector x ∈ Rd , factorization machines use w and Z to predict a target y ∈ R. The main advantage of factorization machines is that they learn the feature interaction weight matrix in factorized form, Z = V V T , where V ∈ Rd×k and k  d is a rank hyper-parameter. This reduces overfitting, since the number of parameters to estimate is reduced from d2 to kd, and allows to compute predictions efficiently. Although they can be used for any supervised learning task such as classification and regression, factorization machines are especially useful for recommender systems. As shown in [12][13], factorization machines can mimic many existing factorization models just by choosing an appropriate feature representation for x. Examples include standard matrix factorization, SVD++ [8], timeSVD++[9] and PITF (pairwise interaction tensor factorization) [16]. Moreover, it is easy to incorporate auxiliary features such as c Springer International Publishing Switzerland 2015  A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 19–35, 2015. DOI: 10.1007/978-3-319-23525-7 2

20

M. Blondel et al.

user and item attributes, contextual information [15] and cross-domain feedback [10]. In [14], it was shown that factorization machines achieve predictive accuracy as good as the best specialized models on the Netflix and KDDcup 2012 challenges. In short, factorization machines are a generic framework which combines the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfortunately, factorization machines have two main drawbacks. First, they involve a non-convex optimization prob