Understanding the Wine Judges and Evaluating the Consistency Through White-Box Classification Algorithms

Wine is a broad field of study and is more and more popular today. However, limited amounts of data science and data mining research are applied on this topic to benefit wine producers, distributors, and consumers. According to the American Association of

  • PDF / 1,049,404 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 100 Downloads / 173 Views

DOWNLOAD

REPORT


2

1 Department of Computer Science, University of Central Arkansas, Conway, AR 72034, USA [email protected] Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA

Abstract. Wine is a broad field of study and is more and more popular today. However, limited amounts of data science and data mining research are applied on this topic to benefit wine producers, distributors, and consumers. According to the American Association of Wine Economics, “Who is a reliable wine judge?” and “Are wine judges consistent?” are typical questions that beg for formal statistical answers. This paper proposes to use the white box classification algorithms to understand the wine judges and evaluate the consistency while they score a wine as 90+ or 90−. Three white box classification algorithms, Naïve Bayes, Decision Tree, and K-nearest neighbors are applied to wine sensory data derived from professional wine reviews. Each algorithm is able to tell how the judges make their decision. The extracted information is also useful to wine producers, distributors, and consumers. The data set includes 1000 wines with 500 scored as 90+ points (positive class) and 500 scored as 90− points (negative class). 5-fold cross validation is used to validate the performance of classification algorithms. The higher prediction accuracy indicates the higher consistency of the wine judge. The best white box classification algorithm prediction accuracy we produced is as high as 85.7 % from a modified version of Naïve Bayes algorithm. Keywords: Wineinformatics  Wine judges evaluation  Decision tree  Naïve Bayes  K-nearest neighbors  SVM

1 Introduction Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an “interesting” outcome. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers [1]. © Springer International Publishing Switzerland 2016 P. Perner (Ed.): ICDM 2016, LNAI 9728, pp. 239–252, 2016. DOI: 10.1007/978-3-319-41561-1_18

240

B. Chen et al.

With the development of society, and as quality of life rises, the qualities and varieties of wines are increasing year by year. According to OIV (International Organization of Wine and Vine) [2] estimates, 2011 global production (not taking into account must and grape-juice) is around 2,558 million hectoliters, 700,000 more than in 2010 [3]. OIV also estimates that 2011 global wine consumption is at about 2, 419 million hl (1 hl = 100,000 ml), which is an increase on the previous year, of 1.7 million hl [3]. In accordance with this information, wine is one of the most widely consumed beverages in the world and has very obvious commercial value as well as social importance. Therefore, the evaluation of the quality of wine plays a very important role for both manufacture and sale