Comparison with Recommendation Algorithm Based on Random Forest Model

Product recommendation based on user behavior is a hot research topic In the Internet era in the same data set, the features that the results of the various classifications are a greater difference were handled with random forest model. This paper compare

  • PDF / 485,468 Bytes
  • 8 Pages / 439.37 x 666.142 pts Page_size
  • 64 Downloads / 245 Views

DOWNLOAD

REPORT


Key Laboratory of Information System Security of Ministry of Education, TNLIST, School of Software, Tsinghua University, Beijing 100084, China 2 College of Computer Science and Technology, Jilin University, Changchun 130012, China [email protected]

Abstract. Product recommendation based on user behavior is a hot research topic In the Internet era in the same data set, the features that the results of the various classifications are a greater difference were handled with random forest model. This paper compares the mainstream classification algorithm C4.5 and CART and analyzes 578,906,480 user behavior records on the results of actual transaction in Alibaba. The results show that CART decision tree algorithm is more suitable for large e-commerce data mining. Keywords: User behavior CART



Random forest model



Decision tree



C4.5



1 Introduction User implicit demand excavated from the mass of information on user behaviors is essential for service providers. Currently, the recommended system [1] has been preliminarily applied in business, but how to construct a highly efficient and intelligent recommendation algorithm is still a hot topic. Random Forests model that a classification prediction model [2] is proposed by Leo Breiman, it has many advantages, such as learning faster, less parameters and fault tolerance, since it was proposed in many fields received applications. Guo Yingjie et al. used random forest classification to identifies plant resistance gene [3]; Li Jiangeng et al. analyze gene pathways of cancer microarray data based on random forest [4] and Fang Kuangnan predicts fund yields direction used random forests model [5]. In this paper, the dataset is massive amounts of user behavior in the Alibaba website real deal. We defined user behavior attribute set and compared with classification algorithm C4.5 and CART based on random forest model to provide evidence for better user recommendation.

© Springer Nature Singapore Pte Ltd. 2017 J.J. (Jong Hyuk) Park et al. (eds.), Advances in Computer Science and Ubiquitous Computing, Lecture Notes in Electrical Engineering 421, DOI 10.1007/978-981-10-3023-9_72

464

Y. Jiang et al.

2 Basic Theory 2.1

Random Forests Model

Random Forests is classifier made more decision independent trees [6, 7]. The generation of decision tree is generally controlled by the property division and pruning, but when a large number of features, it may be over-fitting problems. Random forests use boosting [8, 9] resampling method to extract plurality of samples from the original data set, and to construct the decision tree for each sample, through the plural the of decision tree, it can forecast the final prediction results (Fig. 1).

Fig. 1. Random forests model

2.2

C4.5 Algorithm

C4.5 algorithm [10] starting from the root node assigned the best properties. The value of each attribute will generate the corresponding branch, and generate new nodes on each branch. Best attribute selection criteria is based on the definition of information entropy gain ratio to select test properties