Reinforcement Learning and Attractor Neural Network Models of Associative Learning

Despite indisputable advances in reinforcement learning (RL) research, some cognitive and architectural challenges still remain. The primary source of challenges in the current conception of RL stems from the theory’s way to define states. Whereas states

  • PDF / 693,008 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 14 Downloads / 273 Views

DOWNLOAD

REPORT


Abstract Despite indisputable advances in reinforcement learning (RL) research, some cognitive and architectural challenges still remain. The primary source of challenges in the current conception of RL stems from the theory’s way to define states. Whereas states under laboratory conditions are tractable (due to the Markov property), states in real-world RL are high-dimensional, continuous and partially observable. Hence, effective learning and generalization can be guaranteed if the subset of reward relevant dimensions were correctly identified for each state. Moreover, the computational discrepancy between model-free and model-based RL methods creates a stability-plasticity dilemma in terms of how to guide optimal decision-making control in case of interactive and competitive multiple systems, each of which implements different type of RL methods. By showing behavioral results of how human subjects flexibly define states in a reversal learning paradigm contrary to a simple RL model, we argue that these challenges can be met by infusing the RL framework as an algorithmic theory of human behavior with the strengths of the attractor framework at the level of neural implementation. Our position is supported by the hypothesis that ‘attractor states’ which are stable patterns of self-sustained and reverberating brain activity, are a manifestation of the collective dynamics of neuronal populations in the brain. With its capacity of pattern-completion along with the ability to link events in temporal order, an attractor network becomes relatively insensitive to noise allowing to account for sparse data which is characteristic to high-dimensional and continuous real-world RL. Keywords Attractor neural networks · Model-free and model-based reinforcement learning · Stability-plasticity dilemma · Reversal learning

O. H. Hamid (B) School of Computer Science, University of Nottingham, Nottingham, UK e-mail: [email protected]; [email protected] J. Braun Institute of Cognitive Biology, University of Magdeburg, Magdeburg, Germany © Springer Nature Switzerland AG 2019 C. Sabourin et al. (eds.), Computational Intelligence, Studies in Computational Intelligence 829, https://doi.org/10.1007/978-3-030-16469-0_17

327

328

O. H. Hamid and J. Braun

1 Introduction Reinforcement learning (RL) is primarily used in two fields. In machine learning and artificial intelligence (AI) related disciplines, RL is an algorithmic theory of optimal action control in sequential decision-making processes when only limited feedback is available [1–6]. In cognitive science, RL describes the practice by which animals and humans probe reward contingencies while acting in a novel environment [7–10]. Over the past two decades, research on RL has had a plethora of remarkable successes. In particular, the linking of ‘reward-prediction error’, which was introduced within the temporal difference learning algorithm [1], to the physic bursts of dopaminergic neurons in the ventral tegmental area of monkeys’ midbrains [7] provided a living instance