Computational complexity of kernel-based density-ratio estimation: a condition number analysis
- PDF / 1,004,714 Bytes
- 30 Pages / 439.37 x 666.142 pts Page_size
- 90 Downloads / 208 Views
Computational complexity of kernel-based density-ratio estimation: a condition number analysis Takafumi Kanamori · Taiji Suzuki · Masashi Sugiyama
Received: 2 March 2011 / Accepted: 13 September 2012 / Published online: 12 December 2012 © The Author(s) 2012
Abstract In this study, the computational properties of a kernel-based least-squares densityratio estimator are investigated from the viewpoint of condition numbers. The condition number of the Hessian matrix of the loss function is closely related to the convergence rate of optimization and the numerical stability. We use smoothed analysis techniques and theoretically demonstrate that the kernel least-squares method has a smaller condition number than other M-estimators. This implies that the kernel least-squares method has desirable computational properties. In addition, an alternate formulation of the kernel least-squares estimator that possesses an even smaller condition number is presented. The validity of the theoretical analysis is verified through numerical experiments. Keywords Kernel · Density-ratio · Condition number · Smoothed analysis
1 Introduction In this section, we introduce background materials of our target problem addressed in this study.
Editor: Massimiliano Pontil. T. Kanamori () Nagoya University Department of Computer Science and Mathematical Informatics, Nagoya University, Furocho, Chikusaku, Nagoya 464-8601, Japan e-mail: [email protected] T. Suzuki Department of Mathematical Informatics, University of Tokyo, 3-1 Hongo 7-Chome, Bunkyo-ku, Tokyo 113-8656, Japan e-mail: [email protected] M. Sugiyama Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan e-mail: [email protected]
432
Mach Learn (2013) 90:431–460
1.1 Density-ratio estimation Recently, methods of directly estimating the ratio of two probability densities without going through density estimation have been developed. These methods can be used to solve various machine learning tasks such as importance sampling, divergence estimation, mutual information estimation, and conditional probability estimation (Sugiyama et al. 2009, 2012b). The kernel mean matching (KMM) method (Gretton et al. 2009) directly yields density ratio estimates by efficiently matching the two distributions using a special property of the universal reproducing kernel Hilbert spaces (RKHSs) (Steinwart 2001). Another approach is the M-estimator (Nguyen et al. 2010), which is based on the non-asymptotic variational characterization of the ϕ-divergence (Ali and Silvey 1966; Csiszár 1967). See Sugiyama et al. (2008a) for a similar algorithm that uses the Kullback-Leibler divergence. Non-parametric convergence properties of the M-estimator in RKHSs have been elucidated under the Kullback-Leibler divergence (Nguyen et al. 2010; Sugiyama et al. 2008b). A squared-loss version of the M-estimator for linear density-ratio models called unconstrained Least-Squares Importance Fitting (uLSIF) has also been developed (Kanamori et al. 200
Data Loading...