Application of LS-SVM and Variable Selection Methods on Predicting SSC of Nanfeng Mandarin Fruit

The objective of this research was to investigate the performance of LS-SVM combined with several variable selection methods to assess soluble solids content (SSC) of Nanfeng mandarin fruit. Visible/near infrared (Vis/NIR) diffuse reflectance spectra of s

  • PDF / 824,051 Bytes
  • 14 Pages / 439.363 x 666.131 pts Page_size
  • 75 Downloads / 149 Views

DOWNLOAD

REPORT


Abstract. The objective of this research was to investigate the performance of LS-SVM combined with several variable selection methods to assess soluble solids content (SSC) of Nanfeng mandarin fruit. Visible/near infrared (Vis/NIR) diffuse reflectance spectra of samples were acquired by a QualitySpec spectrometer in the wavelength range of 350~1800 nm. Four variable selection methods were conducted to select informative variables for SSC, and least squares-support vector machine (LS-SVM) with radial basis function (RBF) kernel was used develop calibration models. The results indicate that four variable selection methods are useful and effective to select informative variables, and the results of LS-SVM with these variable selection methods are comparable to the results of full-spectrum partial least squares (PLS). Genetic algorithm (GA) combined with successive projections algorithm (SPA) is the best variable selection method among these four methods. The correlation coefficients and RMSEs in LS-SVM with GA-SPA model for calibration, validation and prediction sets are 0.935, 0.560%, 0.912, 0.631% and 0.933, 0.594%, respectively. Keywords: Vis/NIR, LS-SVM, variable selection, soluble solids content, Nanfeng mandarin fruit.

1

Introduction

Soluble solids content (SSC) is one of the most important properties of fruits that match human’s taste. In recent years, Visible /near infrared (Vis/NIR) spectroscopy has become a well-accepted method for SSC assessment of fruit because it works fast and nondestructively, allows no sample preparation and brings no environmental chemistry pollution. And it has been used to measure SSC in a variety of fruits such as mandarin fruit [1-2], apple [3-4], pear [5-6], kiwifruit [7-8], melon [9-10] and so on. Due to high resolution in modern spectroscopic instrument, the spectral data usually has hundreds or thousands wavelength variables, so contains substantial information. However, some of the information is useless or irrelevant to component properties, which will worsen the predictive ability of the model for component properties and should be eliminated. Recently, many variable selection methods were developed to eliminate irrelevant D. Li and Y. Chen (Eds.): CCTA 2013, Part I, IFIP AICT 419, pp. 249–262, 2014. © IFIP International Federation for Information Processing 2014

250

T. Sun et al.

information or variables, and reserve important information or variables. Ying et al. (2008) [11] used genetic algorithm (GA) method to select important variables for sugar content (SC) of apples, then developed calibration model by partial least squares (PLS). Compared to full-spectrum PLS, the root mean square error of prediction (RMSEP) in GA-PLS was decreased from 0.512% to 0.395%. Liu et al. (2009) [12] applied successive projections algorithm (SPA) to choose variables for organic acids of plum vinegar. Least squaressupport vector machine (LS-SVM) was used to develop calibration models. The models developed by SPA-LS-SVM for organic acids were better than that of full-spectrum PLS. So