Gradient boosting in crowd ensembles for Q-learning using weight sharing

  • PDF / 2,320,765 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 51 Downloads / 211 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Gradient boosting in crowd ensembles for Q‑learning using weight sharing D. L. Elliott1 · K. C. Santosh1   · Charles Anderson2 Received: 1 November 2019 / Accepted: 9 March 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Reinforcement learning (RL) is a double-edged sword: it frees the human trainer from having to provide voluminous supervised training data or from even knowing a solution. On the other hand, a common complaint about RL is that learning is slow. Deep Q-learning (DQN), a somewhat recent development, has allowed practitioners and scientists to solve tasks previously thought unsolvable by a reinforcement learning approach. However DQN has resulted in an explosion in the number of model parameters which has further exasperated the computational needs of Q-learning during training. In this work, an ensemble approach which improves the training time, in terms of the number of interactions with the training environment, is proposed. In the presented experiments, it is shown that the proposed approach improves stability of during training, results in improved average performance, results in more reliable training, and faster learning of features in convolutional layers. Keywords  Reinforcement Learning · Gradient Boosting · Convolutional Neural Network · Deep Q-learning · Weight Sharing · Ensemble Learning

1 Introduction A frequent criticism of Q-learning is that it is too slow to find a solution. Recently a crowd ensemble approach which could be aptly described as bagging Q-functions was introduced which addressed this and other reinforcement learning (RL) challenges [7]. In that work the stabilizing affect of a crowd ensemble approach based upon voting during action selection ( maxa∈A Q(s, a) where A is the set of available actions) was emphasized. In this work another advantage of a crowd ensemble approach to Q-learning, especially when using deep neural networks is examined: the ability to combine gradients across ensemble members to speed-up learning of features necessary to learn a successful Q-function. We find * K. C. Santosh [email protected] D. L. Elliott [email protected] Charles Anderson [email protected] 1



Department of Computer Science, University of South Dakota, 414 E Clark St, Vermillion, SD 57069, USA



Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA

2

that this approach speeds-up ANN-based Q-learning training by reducing superfluous, or even spurious, parameter updates.

1.1 Background This work involves an ensemble approach to deep Q-learning (DQN), a form of reinforcement learning (RL). RL is a form of machine learning which trains an agent using performance feedback as a training signal. RL is comprised of a user-designed reward function, a value function, and an action selection policy. The reward function is the value to be maximized. The value function models the long-term, cumulative reward of a state/action pair. The policy determines how the value function is used to selec