Domain-Oriented Data-Driven Data Mining (3DM): Simulation of Human Knowledge Understanding

Recent advances in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gather and store incredible volumes of data. It creates unprecedented opportunities for large-scale knowledg

  • PDF / 371,344 Bytes
  • 13 Pages / 430 x 660 pts Page_size
  • 29 Downloads / 209 Views

DOWNLOAD

REPORT


Abstract. Recent advances in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gather and store incredible volumes of data. It creates unprecedented opportunities for large-scale knowledge discovery from database. Data mining (DM) technology has emerged as a means of performing this discovery. It is a useful tool in many fields such as marketing, decision making, etc. There are countless researchers working on designing efficient data mining techniques, methods, and algorithms. Unfortunately, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? What is the rule we should obey in a data mining process? In this paper, we will address these questions and propose our answers based on a conceptual data mining model. Our answer would be “data mining is a process of knowledge transformation”. It is consistent with the process of human knowledge understanding. Based on analysis of the user-driven and “data-driven” data mining approaches proposed by many other researchers, a conceptual knowledge transformation model and a conceptual domain-oriented data-driven data mining (3DM) model are proposed. It integrates user-driven data mining and data-driven data mining into one system. Some future works for developing such a 3DM data mining system are proposed.

1

Introduction

Recent advances in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gather and store incredible volumes of data. One example is the hundreds of terabytes of DNA, protein-sequence, and gene-expression data that biological sciences researchers have gathered at steadily increasing rates. Similarly, data warehouses store massive quantities of information about various business operation aspects. Complex distributed systems (computer systems, communication networks, and N. Zhong et al. (Eds.): WImBI 2006, LNAI 4845, pp. 278–290, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Domain-Oriented Data-Driven Data Mining (3DM)

279

power systems, for example) are equipped with sensors and measurement devices that gather and store a variety of data for use in monitoring, controlling, and improving their operations also [1]. Data mining (also known as Knowledge Discovery in Databases - KDD) is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data [2]. It uses machine learning, statistical and visualization techniques to discover knowledge from data and represent it in a form that is easily comprehensible for humans. Data mining has become a hot field in artificial intelligence. Over 10 million messages will be resulted by searching “data mining” on GOOGLE. There are numerous researchers and data miners working on designing efficient data mining techniqu