Distributed Learning Classifier Systems

Genetics-based machine learning methods – also called learning classifier systems (LCSs) – are evolutionary computation based data mining techniques. The advantages of these techniques are: to provide rule-based models that represent human-readable patter

  • PDF / 463,150 Bytes
  • 23 Pages / 439 x 666 pts Page_size
  • 11 Downloads / 275 Views

DOWNLOAD

REPORT


2

Artificial Life and Adaptive Robotics Laboratory, School of Information Technology and Electrical Engineering, The University of New South Wales, Canberra, NSW, Australia, [email protected], [email protected], [email protected], [email protected], http://www.itee.adfa.edu.au/∼alar Department of Computer Engineering, Faculty of Engineering, Research Center for Communication and Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Thailand

Summary. Genetics-based machine learning methods – also called learning classifier systems (LCSs) – are evolutionary computation based data mining techniques. The advantages of these techniques are: to provide rule-based models that represent human-readable patterns; to learn incrementally, capable of adapting quickly to any changes in dynamic environments; and some of them have linear 0(n) learning complexity in the size of data set. However, not too much effort has yet been put into investigating LCSs in distributed environments. This chapter will scrutinize several issues of LCSs in distributed environments such as knowledge passing in the system, knowledge combination methods at the central location, and the effect on the system’s learning accuracy of having different numbers of distributed sites.

1 Introduction Pervasive computing has opened a new era in technology and communications where people can access and/or transfer data around the world in real time. Instead of tons of books, a huge amount of information can be preserved within small and affordable electronic systems. Along with fast and convenient ways to access the outside world, huge amounts of electronic data are also generated with a greater frequency than at any time in the past. Data mining, the process of discovering novel and potentially useful patterns in the data [11], has become an effective method to discover the tacit knowledge hidden in such overwhelming databases. Nowadays, most databases in large organizations are distributed physically in many locations due to the trend of globalization. For example, a company might have multiple branches placed in many cities, states, countries, etc. From the management’s perspective, the data generated from multiple locations need to be integrated into a single coherent knowledge base for the future H.H. Dam et al.: Distributed Learning Classifier Systems, Studies in Computational Intelligence (SCI) 125, 69–91 (2008) c Springer-Verlag Berlin Heidelberg 2008 www.springerlink.com 

70

H.H. Dam et al.

decision making. However, with the large amounts of data generated daily at each location, it is not possible to transfer all the data to a central location for normal data mining due to security issues, limited network bandwidth, and even because of the internal policies for some organizations. Distributed Data Mining (DDM), an extension of data mining techniques in distributed environments, was introduced to tackle this problem. The primary purpose of DDM is to discover and combine useful knowledge from databases that come f