Gradient boosting in crowd ensembles for Q-learning using weight sharing

PDF / 2,320,765 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
51 Downloads / 238 Views

ORIGINAL ARTICLE

Gradient boosting in crowd ensembles for Q‑learning using weight sharing D. L. Elliott1 · K. C. Santosh1 · Charles Anderson2 Received: 1 November 2019 / Accepted: 9 March 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Reinforcement learning (RL) is a double-edged sword: it frees the human trainer from having to provide voluminous supervised training data or from even knowing a solution. On the other hand, a common complaint about RL is that learning is slow. Deep Q-learning (DQN), a somewhat recent development, has allowed practitioners and scientists to solve tasks previously thought unsolvable by a reinforcement learning approach. However DQN has resulted in an explosion in the number of model parameters which has further exasperated the computational needs of Q-learning during training. In this work, an ensemble approach which improves the training time, in terms of the number of interactions with the training environment, is proposed. In the presented experiments, it is shown that the proposed approach improves stability of during training, results in improved average performance, results in more reliable training, and faster learning of features in convolutional layers. Keywords Reinforcement Learning · Gradient Boosting · Convolutional Neural Network · Deep Q-learning · Weight Sharing · Ensemble Learning

1 Introduction A frequent criticism of Q-learning is that it is too slow to find a solution. Recently a crowd ensemble approach which could be aptly described as bagging Q-functions was introduced which addressed this and other reinforcement learning (RL) challenges [7]. In that work the stabilizing affect of a crowd ensemble approach based upon voting during action selection ( maxa∈A Q(s, a) where A is the set of available actions) was emphasized. In this work another advantage of a crowd ensemble approach to Q-learning, especially when using deep neural networks is examined: the ability to combine gradients across ensemble members to speed-up learning of features necessary to learn a successful Q-function. We find * K. C. Santosh [email protected] D. L. Elliott [email protected] Charles Anderson [email protected] 1

Department of Computer Science, University of South Dakota, 414 E Clark St, Vermillion, SD 57069, USA

Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA

2

that this approach speeds-up ANN-based Q-learning training by reducing superfluous, or even spurious, parameter updates.

1.1 Background This work involves an ensemble approach to deep Q-learning (DQN), a form of reinforcement learning (RL). RL is a form of machine learning which trains an agent using performance feedback as a training signal. RL is comprised of a user-designed reward function, a value function, and an action selection policy. The reward function is the value to be maximized. The value function models the long-term, cumulative reward of a state/action pair. The policy determines how the value function is used to selec

Data Loading...

Gradient boosting in crowd ensembles for Q-learning using weight sharing

Recommend Documents

Enhancing Link Prediction Using Gradient Boosting Features

Evaluating Heterogeneous Ensembles with Boosting Meta-Learner

Dealing with High Dimensional Sentiment Data Using Gradient Boosting Machines

Estimating the Importance of Relational Features by Using Gradient Boosting

Flood susceptibility assessment using extreme gradient boosting (EGB), Iran

Crowd density estimation in still images using multiple local features and boosting regression ensemble

A comparative analysis of gradient boosting algorithms

Federated Soft Gradient Boosting Machine for Streaming Data

Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapl

Construction of an indoor radio environment map using gradient boosting decision tree

Efficient reliability analysis of earth dam slope stability using extreme gradient boosting method

Forecasting of Real GDP Growth Using Machine Learning Models: Gradient Boosting and Random Forest Approach