Data Reduction for Pattern Recognition and Data Analysis
Pattern recognition involves various human activities of great practical significance, such as data-based bankruptcy prediction, speech/image recognition, machine fault detection and cancer diagnosis. Clearly, it would be immensely useful to build machine
- PDF / 416,575 Bytes
- 29 Pages / 439.2 x 666 pts Page_size
- 103 Downloads / 265 Views
1 Introduction Pattern recognition [5, 13, 58] involves various human activities of great practical significance, such as data-based bankruptcy prediction, speech/image recognition, machine fault detection and cancer diagnosis. Clearly, it would be immensely useful to build machines to fulfill pattern recognition tasks in a reliable and efficient way. The most general and most natural pattern recognition frameworks mainly rely on statistical characterizations of patterns with an assumption that they are generated by a probabilistic system. Research on neural pattern recognition has been widely conducted during the past few decades. In contrast to statistical methods, no assumptions (a priori knowledge) are required for building a neural pattern recognition framework. Despite the fact that different pattern recognition systems use different working mechanisms, the basic procedures of all these systems are basically the same. A typical pattern recognition procedure generally consists of three sequential parts – a sensing model for collecting and preprocessing raw data from real sites, a data processing model (which includes feature extraction/ selection and pattern selection), and a recognition/classification model [13,58]. When one is handling a pattern recognition process, the following basic issues must be addressed: • How to process the raw data for a pattern recognition task? This issue concerns the sensing and preprocessing stage of pattern recognition; • How to determine appropriate data for a given pattern recognition model? This is a very important concern in the data processing stage. Deleting noisy or redundant data (including features and patterns) invariably leads to enhanced recognition performance; • How to design an appropriate classifier based on a given data set? This topic has been widely discussed in the pattern recognition community.
T.W.S. Chow and D. Huang: Data Reduction for Pattern Recognition and Data Analysis, Studies in Computational Intelligence (SCI) 115, 81–109 (2008) c Springer-Verlag Berlin Heidelberg 2008 www.springerlink.com
82
T.W.S. Chow and D. Huang
Various learning algorithms and models have been proposed in an attempt to enhance recognition accuracy as much as possible, and in a fashion that is as simple as possible. Basically, through eliminating ‘noisy’ data (such as noisy samples and irrelevant features) and compressing redundant samples/features, a data processing technique is used to reduce the data volume without causing the loss of useful information. The main merits of such data processing include enhancing the scalability, recognition accuracy, computational and measurement efficiency, as well as to facilitate interpretation of the entire pattern recognition procedure [6, 24, 43]. As the size of data has significantly increased in recent applications, data preprocessing has become essential in many pattern recognition procedures. In this Chapter, data reduction/selection is specifically denoted as reduction/selection of data samples.
2 Data Reduction While computer technol
Data Loading...