An Algorithmic Description

In the previous chapter, the optimal set of classifiers given some data \(\mathcal{D}\) was defined as the one given by the model structure \(\mathcal{M}\) that maximises \(p(\mathcal{M | D})\) . In addition, a Bayesian LCS model for both regression and c

  • PDF / 1,349,004 Bytes
  • 37 Pages / 430 x 660 pts Page_size
  • 98 Downloads / 251 Views

DOWNLOAD

REPORT


In the previous chapter, the optimal set of classifiers given some data D was defined as the one given by the model structure M that maximises p(M|D). In addition, a Bayesian LCS model for both regression and classification was introduced, and it was shown how to apply variational Bayesian inference to compute a lower bound on ln p(M|D) for some given M and D. To demonstrate that the definition of the optimal classifier set leads to useful results, a set of simple algorithms are introduced that demonstrate its use on a set of regression tasks. This includes two possible approaches to search the model structure space in order to maximise p(M|D), one based on a basic genetic algorithm to create a simple Pittsburgh-style LCS, and the other on sampling from the model posterior p(M|D) by Markov Chain Monte Carlo (MCMC) methods. These approaches are by no means supposed to act as viable competitors to current LCS, but rather as prototype implementations to demonstrate the correctness and usefulness of the optimality definition. Additionally, the presented formulation of the algorithm seeks for readability rather than performance. Thus, there might still be plenty of room for optimisation. The core of both approaches is the evaluation of p(M|D) and its comparison for different classifier sets in order to find the best set. The evaluation of p(M|D) is approached by variational Bayesian inference, as introduced in the previous chapter. Thus, the algorithmic description of how to find p(M|D) also provides a summary of the variational approach for regression classifier models and a better understanding of how it can be implemented. Even though not described here, the algorithm can easily be modified to handle classification rather than regression. A general drawback of the algorithm as it is presented here is that it does not scale well with the number of classifiers, and that it can currently only operate in batch mode. The reader is reminded, however, that the algorithmic description is only meant to show that the definition of the optimal set of classifiers is a viable one. Possible extensions to this work, as described later in this chapter, allude on how this definition can be incorporated into current LCS or can kindle the development of new LCS. Firstly, a set of functions are introduced, that in combination compute a measure of the quality of a classifier set given the data. As this measure can J. Drugowitsch: Des. & Anal. of Learn. Class. Sys.: A Prob. Approach, SCI 139, pp. 165–201, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com 

166

An Algorithmic Description

subsequently by used by any global search algorithm that is able to find its maximum in the space of possible model structures, its algorithmic description is kept separate from the model structure search. For the structure search, two simple alternatives are provided in a later section, one based on genetic algorithms, and another based on sampling the model posterior p(M|D) by MCMC methods. Finally, both approaches are applied to simple regression tasks t