Optimality-Based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techn

PDF / 1,106,795 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
10 Downloads / 217 Views

DOWNLOAD

REPORT

Abstract. Learning classiﬁer systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suﬀer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A wellstudied LCS architecture is XCSF, which in the RL setting acts as a Qfunction approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction—GNMC) and study its operation on XCSF’s trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a signiﬁcant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive.

Keywords: Reinforcement learning XSCF · Compaction

1

· Learning classiﬁer system ·

Introduction

Reinforcement learning (RL) is characterised by an agent learning a behavioural policy in an environment by means of maximising a reward signal. Learning Classiﬁer Systems (LCSs) are a paradigm of cognitive systems that originated via representing agents in this framework, although due to ﬂexibility in implementation have also been widely adapted to other kinds of machine learning (ML) tasks such as classiﬁcation and clustering [16]. The most widely-studied LCS architecture to date, Wilson’s XCS [18], is at its heart a reinforcement c Springer Nature Switzerland AG 2020 T. B¨ ack et al. (Eds.): PPSN 2020, LNCS 12270, pp. 471–484, 2020. https://doi.org/10.1007/978-3-030-58115-2_33

472

J. T. Bishop and M. Gallagher

learner. More recently, an extension of XCS to allow for function approximation, dubbed XCSF [19], has been successfully used in the RL setting for value function approximation [12,13]. LCSs utilise a combination of evolutionary computation and ML techniques to create population-based solutions to prediction problems. The most common style of LCSs are Michigan-style LCSs, where each individual (classiﬁer) in the population represents a partial solution, and classiﬁers co-operate in a potentially overlapping piecewise ensemble to deﬁne an overall solution [16]. A general issue with Michigan-style LCSs is that of population bloat and/or redundancy. Since these systems learn in an online fashion and regularly reﬁne their population via a genetic algorithm (GA), after learning is complete there are often members of the population that have not had time to properly adapt to the environment and form accurate predictions. A common way to deal with this issue

Data Loading...

Optimality-Based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Recommend Documents

Reinforcement Compaction Response

Reinforcement Learning in Sports

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Basics of Reinforcement Learning

Transfer in Reinforcement Learning Domains

Learning to Play Reinforcement Learning and Games

Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

Guided Reinforcement Learning via Sequence Learning