Optimality-Based Analysis of XCSF Compaction in Discrete Reinforcement Learning
Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techn
- PDF / 1,106,795 Bytes
- 14 Pages / 439.37 x 666.142 pts Page_size
- 10 Downloads / 198 Views
Abstract. Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A wellstudied LCS architecture is XCSF, which in the RL setting acts as a Qfunction approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction—GNMC) and study its operation on XCSF’s trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a significant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive.
Keywords: Reinforcement learning XSCF · Compaction
1
· Learning classifier system ·
Introduction
Reinforcement learning (RL) is characterised by an agent learning a behavioural policy in an environment by means of maximising a reward signal. Learning Classifier Systems (LCSs) are a paradigm of cognitive systems that originated via representing agents in this framework, although due to flexibility in implementation have also been widely adapted to other kinds of machine learning (ML) tasks such as classification and clustering [16]. The most widely-studied LCS architecture to date, Wilson’s XCS [18], is at its heart a reinforcement c Springer Nature Switzerland AG 2020 T. B¨ ack et al. (Eds.): PPSN 2020, LNCS 12270, pp. 471–484, 2020. https://doi.org/10.1007/978-3-030-58115-2_33
472
J. T. Bishop and M. Gallagher
learner. More recently, an extension of XCS to allow for function approximation, dubbed XCSF [19], has been successfully used in the RL setting for value function approximation [12,13]. LCSs utilise a combination of evolutionary computation and ML techniques to create population-based solutions to prediction problems. The most common style of LCSs are Michigan-style LCSs, where each individual (classifier) in the population represents a partial solution, and classifiers co-operate in a potentially overlapping piecewise ensemble to define an overall solution [16]. A general issue with Michigan-style LCSs is that of population bloat and/or redundancy. Since these systems learn in an online fashion and regularly refine their population via a genetic algorithm (GA), after learning is complete there are often members of the population that have not had time to properly adapt to the environment and form accurate predictions. A common way to deal with this issue
Data Loading...