Pruning of Rules and Rule Sets

Overfitting is a key problem for all learning algorithms. It describes the phenomenon that a very good fit of the learned rule set to the training data may result in inaccurate predictions on unseen data. Typically, this problem occurs when the hypothesis

  • PDF / 1,025,773 Bytes
  • 30 Pages / 439.36 x 666.15 pts Page_size
  • 32 Downloads / 215 Views

DOWNLOAD

REPORT


Pruning of Rules and Rule Sets

Overfitting is a key problem for all learning algorithms. It describes the phenomenon that a very good fit of the learned rule set to the training data may result in inaccurate predictions on unseen data. Typically, this problem occurs when the hypothesis space is too large. Prepruning and post-pruning are two standard techniques for avoiding overfitting. Prepruning deals with it during learning, while post-pruning addresses this problem after an overfitting rule set has been learned. In this chapter, we will discuss a variety of pruning techniques for rule learning algorithms. We will start with a brief discussion of the problem of overfitting (Sect. 9.1), and then focus on the two principal approaches, prepruning (Sect. 9.2) and post-pruning (Sect. 9.3). State-of-the-art approaches typically integrate postpruning into the rule learning phase, which is discussed in Sect. 9.4. Finally, we will also briefly summarize alternative methods for optimizing a learned rule set (Sect. 9.5).

9.1 Overfitting and Pruning The central problem of any machine learning process, including rule learning, is that learning is performed on a restricted set of training examples and that the learned hypothesis is expected to present relevant dependencies in the whole domain, including unseen instances that may be encountered in the future. Learning is based on the assumption that patterns that hold on the training set are also valid in the entire domain. However, this assumption does not necessarily hold. There are many possible patterns and theories that can be constructed for the training set, and there is always a possibility that the constructed hypothesis will be of excellent quality on the training data and very poor quality on the complete domain. This effect is called overfitting.



Parts of this chapter are based on F¨urnkranz and Flach (2005) and F¨urnkranz (1997).

J. F¨urnkranz et al., Foundations of Rule Learning, Cognitive Technologies, DOI 10.1007/978-3-540-75197-7 9, © Springer-Verlag Berlin Heidelberg 2012

187

188

9 Pruning of Rules and Rule Sets

Overfitting is a common problem for all learning algorithms. The reasons for overfitting are manifold, ranging from example representations that do not capture important domain characteristics to inadequate hypothesis spaces. One particular problem is that real-world databases very often contain noise, i.e., erroneous or incomplete instances. There may be many reasons for noise in databases: manual data collection and data entry are error-prone, measuring devices may have a natural variance in their precision or may fail occasionally, some attribute values may not always be available, judgments obtained from human experts are often inconsistent, and many more. Overfitting typically has the effect that a too-complex theory will be learned, and that the size of this theory will increase with the increasing size of the training set. Consider a propositional dataset consisting of E examples, each described with F features. The examples are even