Understanding the Wine Judges and Evaluating the Consistency Through White-Box Classification Algorithms

Wine is a broad field of study and is more and more popular today. However, limited amounts of data science and data mining research are applied on this topic to benefit wine producers, distributors, and consumers. According to the American Association of

PDF / 1,049,404 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
100 Downloads / 253 Views

DOWNLOAD

REPORT

2

1 Department of Computer Science, University of Central Arkansas, Conway, AR 72034, USA [email protected] Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA

Abstract. Wine is a broad ﬁeld of study and is more and more popular today. However, limited amounts of data science and data mining research are applied on this topic to beneﬁt wine producers, distributors, and consumers. According to the American Association of Wine Economics, “Who is a reliable wine judge?” and “Are wine judges consistent?” are typical questions that beg for formal statistical answers. This paper proposes to use the white box classiﬁcation algorithms to understand the wine judges and evaluate the consistency while they score a wine as 90+ or 90−. Three white box classiﬁcation algorithms, Naïve Bayes, Decision Tree, and K-nearest neighbors are applied to wine sensory data derived from professional wine reviews. Each algorithm is able to tell how the judges make their decision. The extracted information is also useful to wine producers, distributors, and consumers. The data set includes 1000 wines with 500 scored as 90+ points (positive class) and 500 scored as 90− points (negative class). 5-fold cross validation is used to validate the performance of classiﬁcation algorithms. The higher prediction accuracy indicates the higher consistency of the wine judge. The best white box classiﬁcation algorithm prediction accuracy we produced is as high as 85.7 % from a modiﬁed version of Naïve Bayes algorithm. Keywords: Wineinformatics Wine judges evaluation Decision tree Naïve Bayes K-nearest neighbors SVM

1 Introduction Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an “interesting” outcome. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers [1]. © Springer International Publishing Switzerland 2016 P. Perner (Ed.): ICDM 2016, LNAI 9728, pp. 239–252, 2016. DOI: 10.1007/978-3-319-41561-1_18

240

B. Chen et al.

With the development of society, and as quality of life rises, the qualities and varieties of wines are increasing year by year. According to OIV (International Organization of Wine and Vine) [2] estimates, 2011 global production (not taking into account must and grape-juice) is around 2,558 million hectoliters, 700,000 more than in 2010 [3]. OIV also estimates that 2011 global wine consumption is at about 2, 419 million hl (1 hl = 100,000 ml), which is an increase on the previous year, of 1.7 million hl [3]. In accordance with this information, wine is one of the most widely consumed beverages in the world and has very obvious commercial value as well as social importance. Therefore, the evaluation of the quality of wine plays a very important role for both manufacture and sale

Data Loading...

Understanding the Wine Judges and Evaluating the Consistency Through White-Box Classification Algorithms

Recommend Documents

Wine Queens Understanding the Role of Women in Wine Marketing

Evaluating the Position Classification System

Computing WHERE-WHAT Classification Through FLIKM and Deep Learning Algorithms

On the Validation of Traffic Classification Algorithms

Algorithms Towards the Automated Customer Inquiry Classification

Understanding the MeToo Movement Through the Lens of the Twitter

Quantum Classification Algorithms

Multi-party computation: The Three Judges protocol

Optimal Design through the Sub-Relaxation Method Understanding the B

Understanding Innovation Through Exaptation

Evaluating Genetic Algorithms in Protein-Ligand Docking

A Survey on Major Classification Algorithms and Comparative Analysis of Few Classification Algorithms on Contact Lenses