Mining Web Data

The Web is an unique phenomenon in many ways, in terms of its scale, the distributed and uncoordinated nature of its creation, the openness of the underlying platform, and the resulting diversity of applications it has enabled. Examples of such applicatio

PDF / 1,434,139 Bytes
29 Pages / 504.567 x 720 pts Page_size
92 Downloads / 345 Views

DOWNLOAD

REPORT

Mining Web Data

“Data is a precious thing, and will last longer than the systems themselves.”—Tim Berners-Lee

18.1

Introduction

The Web is an unique phenomenon in many ways, in terms of its scale, the distributed and uncoordinated nature of its creation, the openness of the underlying platform, and the resulting diversity of applications it has enabled. Examples of such applications include ecommerce, user collaboration, and social network analysis. Because of the distributed and uncoordinated nature in which the Web is both created and used, it is a rich treasure trove of diverse types of data. This data can be either a source of knowledge about various subjects, or personal information about users. Aside from the content available in the documents on the Web, the usage of the Web results in a signiﬁcant amount of data in the form of user logs or Web transactions. There are two primary types of data available on the Web that are used by mining algorithms. 1. Web content information: This information corresponds to the Web documents and links created by users. The documents are linked to one another with hypertext links. Thus, the content information contains two components that can be mined either together, or in isolation. • Document data: The document data are extracted from the pages on the World Wide Web. Some of these extraction methods are discussed in Chap. 13. • Linkage data: The Web can be viewed as a massive graph, in which the pages correspond to nodes, and the linkages correspond to edges between nodes. This linkage information can be used in many ways, such as searching the Web or determining the similarity between nodes. 2. Web usage data: This data corresponds to the patterns of user activity that are enabled by Web applications. These patterns could be of various types. C. C. Aggarwal, Data Mining: The Textbook, DOI 10.1007/978-3-319-14142-8 18 c Springer International Publishing Switzerland 2015

589

590

CHAPTER 18. MINING WEB DATA • Web transactions, ratings, and user feedback: Web users frequently buy various types of items on the Web, or express their aﬃnity for speciﬁc products in the form of ratings. In such cases, the buying behavior and/or ratings can be leveraged to make inferences about the preferences of diﬀerent users. In some cases, the user feedback is provided in the form of textual user reviews that are referred to as opinions. • Web logs: User browsing behavior is captured in the form of Web logs that are typically maintained at most Web sites. This browsing information can be leveraged to make inferences about user activity.

These diverse data types automatically deﬁne the types of applications that are common on the Web. In coordination with the diﬀerent data types, the applications are also either content- or usage-centric. 1. Content-centric applications: The documents and links on the Web are used in various applications such as search, clustering, and classiﬁcation. Some examples of such applications are as follows: • Data mining applications: Web documents are used

Data Loading...

Mining Web Data

Recommend Documents

Web Data Mining

Advances in Web Intelligence and Data Mining

Web Data Mining Based on Cloud Computing

Web Mining

Web Mining: From Web to Semantic Web First European Web Mining F

Web Structure Mining

Web Data Mining on Intelligent Network Course System

Web Usage Mining

Cross-language Web Mining

Web Content Mining

Dark Web Exploring and Data Mining the Dark Side of the Web

Data Mining