Comparison of Machine Learning Methods for Solving the Problem of Wheat Seeds Classification by Yield Properties

  • PDF / 992,908 Bytes
  • 8 Pages / 612 x 792 pts (letter) Page_size
  • 46 Downloads / 204 Views

DOWNLOAD

REPORT


ETHODS

Comparison of Machine Learning Methods for Solving the Problem of Wheat Seeds Classification by Yield Properties D. D. Barysheva, *, N. N. Baryshevaa, S. P. Pronina, and O. K. Nikol’skiia a

Polzunov Altai State Technical University, Barnaul, 656000 Russia *e-mail: [email protected] Received May 10, 2020

Abstract—The use of data mining in agricultural production is gaining popularity. The results of the implementation of machine learning methods, namely, decision tree, support vector machine and the K-nearest neighbor for solving the problem of wheat seeds classification by yield properties, using bioelectric indicators of seeds are for the first time presented in the work. The effectiveness of the studied classifiers is presented by the accuracy indicators, the confusion matrix construction and training quality cross validation. The methods comparison results found that the decision tree method showed the best results in data classification. The method is quite simple in the model results understanding and interpretation and does not require additional data preparation. The experimental results showed relatively high accuracy (96%) for the sample with a noise component. There is no need to normalize data, add dummy variables or delete missed data. The K-nearest neighbor is also recommended for classifying seeds by yield properties. However, it is inferior in accuracy to decision trees. For sampling with noise the accuracy was 91%. The support vector machine is not a promising tool for solving this problem, although it is an extremely successful method for other areas. Keywords: wheat seeds classification, yield properties, decision tree, support vector machine, K-nearest neighbor, comparison of methods DOI: 10.3103/S1068367420040047

1. INTRODUCTION Classification is one of the most widely used tasks in agriculture, biology, ecology and other areas. Classification methods are applied to solve such tasks as plant and animal disease identification, weed identification, water and soil management [1], weather (climate) forecast, animal behavior determination [2], assessment of a suitable site for ecological conservation and restoration [3], vegetation mapping using remote sensing [4, 5], risk assessment systems for growing certain species [6]. In most cases to solve classification tasks one has to deal with noisy, multidimensional data, which are not very linear and do not correspond to the assumptions of traditional statistical processing methods [7]. Therefore, machine learning methods are increasingly being used for solving this problem. One of the most popular classification methods is the decision tree [8]. However, machine learning methods application is not always an advantage. Some methods require sufficient computational resources. Currently there are few guidelines for using these methods, and this causes some confusion among professionals when choosing the method for this classification problem [9, 10]. The choice of a particular method is determined not only by its effectiveness, but also by