Research and Design of the Clustering System Based on the Column-Store

With the rapid development of the network and the increase of the information on the web, rapid access to the database and data mining become very important. Column-store has the advantage of quick read speed, saving the disk I/O, and can be read by uncom

  • PDF / 177,542 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 58 Downloads / 150 Views

DOWNLOAD

REPORT


Research and Design of the Clustering System Based on the Column-Store Lijun Shen, Tao Zhang, Jinyu Song, Ping Chen, and Jinshuang Wang

Abstract With the rapid development of the network and the increase of the information on the web, rapid access to the database and data mining become very important. Column-store has the advantage of quick read speed, saving the disk I/O, and can be read by uncompressed, which is helpful to acquire knowledge in the massive data. So based on the traditional data mining module, introduce the column store technology. Information base and knowledge base all adopt column store to store and access, and also provide the access interface between the store module and its upper layer module. Then, compute the Minkowski distance and use k-medoids methods in the data clustering module. On the base of the access advantage of the column store and k-medoids methods, this system can improve the speed and the quality of the clustering. The innovation is the application of column store in the clustering system, and provide completed data access interface, using k-medoids methods can detect clusters of arbitrary shape. Computing the Minkowski distance can improve the efficiency of the dissimilarity of objects and clustering speed. Keywords Column-store • Clustering system • K-medoids • Minkowski distance

223.1

Research Background

Data mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. As the rapid development of network, facing the demand of realizing rapid query in massive data storage and clustering the most useful data to provide support for

L. Shen (*) • T. Zhang • J. Song • P. Chen • J. Wang College of Command Automation PLA University of Science, Mailbox 491 NanJing, JiangSu 210007, China e-mail: [email protected] S. Zhong (ed.), Proceedings of the 2012 International Conference on Cybernetics 1745 and Informatics, Lecture Notes in Electrical Engineering 163, DOI 10.1007/978-1-4614-3872-4_223, # Springer Science+Business Media New York 2014

1746

L. Shen et al.

people’s decision, traditional relational database can not meet the current requirement any more. Currently, many research institution, source code-open organization and database manufacture from various countries have made the column-oriented database under way. Column-oriented database is a database of storing data in the storage framework based on column; it has obvious advantage in massive data query, so in this paper propose a clustering system based on column store.

223.2 223.2.1

Column Store Concept of Column Store

Most DBMS’s application used the storage mode of record-oriented, that is a storage mode of store all of the attributes of a record together. In the system of row store, once disk’s write operation can brush all of the attributes of a record into the disk. This system can optimize the write operation, so this store system is called write-optimized system. By contraries, Column store [1] is storing t