A Tool for Web Usage Mining

This paper presents a tool for web usage mining. The aim is centered on providing a tool that facilitates the mining process rather than implement elaborated algorithms and techniques. The tool covers different phases of the CRISP-DM methodology as data p

  • PDF / 560,383 Bytes
  • 10 Pages / 430 x 660 pts Page_size
  • 88 Downloads / 255 Views

DOWNLOAD

REPORT


2

1 Hospital Juan Carlos I Real del Castillo 152 - 3571 Las Palmas - Spain [email protected] Inst. of Intelligent Systems and Num. Applic. in Engineering Univ. of Las Palmas Campus Univ. de Tafira - 35017 Las Palmas - Spain [email protected]

Abstract. This paper presents a tool for web usage mining. The aim is centered on providing a tool that facilitates the mining process rather than implement elaborated algorithms and techniques. The tool covers different phases of the CRISP-DM methodology as data preparation, data selection, modeling and evaluation. The algorithms used in the modeling phase are those implemented in the Weka project. The tool has been tested in a web site to find access and navigation patterns.

1

Introduction

Discovering knowledge from large databases has received great attention during the last decade being the data mining the main tool to make it [1]. The world wide web has been considered as the largest repository of information but it lacks of a well defined structure. Thus the world wide web is a good environment to make data mining receiving the name of Web Mining [2,3]. Web mining can be divided into three main topics: Content Mining, Structure Mining and Usage Mining. This work is focused on Web Usage Mining (WUM) that has been defined as ”the application of data mining techniques to discover usage patterns from Web data” [4]. Web usage mining can provide patterns of usage to the organizations in order to obtain customer profiles and therefore they can make easier the website browsing or present specific products/pages. The latter has a great interest for businesses because it can increase the sales if they offer only appealing products to the customers although as pointed out Anand (Anand et al, 2004), it is difficult to present a convincing case for Return on Investment. The success of data mining applications, as many other applications, depend on the development of a standard. CRISP-DM, (Standard Cross-Industry Process for Data Mining) (CRISP-DM, 2000) is a consortium of companies that has defined and validated a data mining process that can be used into different data mining projects as web usage mining. The life cycle of a data mining project is defined by CRISP-DM into 6 stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment. The Business Understanding phase is highly connected with the problem to be solved because they defined the business objectives of the application. The last H. Yin et al. (Eds.): IDEAL 2007, LNCS 4881, pp. 695–704, 2007. c Springer-Verlag Berlin Heidelberg 2007 

696

J.M. Domenech and J. Lorenzo

one, Deployment, is not easy to make automatically because each organization has its own information processing management. For the rest of stages a tool can be designed in order to facilitate the work of web usage mining practitioners and reduce the development of new applications. In this work we implement the WEBMINER architecture [5] which divides the WUM process into three main parts: preprocessin