Dynamic fusion for ensemble of deep Q-network

PDF / 2,129,218 Bytes
10 Pages / 595.276 x 790.866 pts Page_size
102 Downloads / 215 Views

ORIGINAL ARTICLE

Dynamic fusion for ensemble of deep Q‑network Patrick P. K. Chan1 · Meng Xiao1 · Xinran Qin1 · Natasha Kees1 Received: 9 July 2020 / Accepted: 30 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Ensemble reinforcement learning, which combines the decisions of a set of base agents, is proposed to enhance the decision making process and speed up training time. Many studies indicate that an ensemble model may achieve better results than a single agent because of the complement of base agents, in which the error of an agent may be corrected by others. However, the fusion method is a fundamental issue in ensemble. Currently, existing studies mainly focus on static fusion which either assumes all agents have the same ability or ignores the ones with poor average performance. This assumption causes current static fusion methods to overlook base agents with poor overall performance, but excellent results in select scenarios, which results in the ability of some agents not being fully utilized. This study aims to propose a dynamic fusion method which utilizes each base agent according to its local competence on test states. The performance of a base agent on the validation set is measured in terms of the rewards achieved by the agent in next n steps. The similarity between a validation state and a new state is quantified by Euclidian distance in the latent space and the weights of each base agent are updated according to its performance on validation states and their similarity to a new state. The experimental studies confirm that the proposed dynamic fusion method outperforms its base agents and also the static fusion methods. This is the first dynamic fusion method proposed for deep reinforcement learning, which extends the study on dynamic fusion from classification to reinforcement learning. Keywords Ensemble · Deep reinforcement learning · Dynamic fusion · Deep Q-network

1 Introduction Deep Reinforcement Learning (DRL) [18] achieves exemplary performance in many decision-making applications, e.g., Go [28], Auto-Driving [25], Robotics [18], Portfolio management [16], and Atari games [22]. DRL is a learning paradigm which aims to maximize the cumulative reward achieved by the interaction of an agent with an environment. Since an agent learns via the interactions with an environment without any supervision, one of the drawbacks of DRL * Meng Xiao [email protected] Patrick P. K. Chan [email protected] Xinran Qin [email protected] Natasha Kees [email protected] 1

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

is the training complexity [18, 25]. One possible solution is ensemble DRL [14, 34] which makes a decision by combining a set of base agents with reasonable performance. Many studies [5, 9, 26] suggest that an ensemble DRL model may achieve better results than a single DRL model because of the base agents’ cooperation, i.e., a misstep of a base agent can be corrected by the other agents.

Data Loading...

Dynamic fusion for ensemble of deep Q-network

Recommend Documents

Exploring robustness management for dynamic technology fusion

Ensemble echo network with deep architecture for time-series modeling

Ensemble-Based Deep Metric Learning for Few-Shot Learning

Multi-sensor Dynamic Image Fusion

Deep neural de-raining model based on dynamic fusion of multiple vision tasks

MASDES-DWMV: Model for Dynamic Ensemble Selection Based on Multiagent System and Dynamic Weighted Majority Voting

Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction

Deep Decoupling of Defocus and Motion Blur for Dynamic Segmentation

Network Intrusion Detection Based on Data Feature Dynamic Ensemble Model

Pharmaceutical Drug Design Using Dynamic Connectionist Ensemble Networks

Gender Detection on Social Networks Using Ensemble Deep Learning

Detecting Abusive Comments Using Ensemble Deep Learning Algorithms