Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

PDF / 1,506,528 Bytes
17 Pages / 595.224 x 790.955 pts Page_size
4 Downloads / 296 Views

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning Diqi Chen1 · Yizhou Wang2 · Wen Gao3

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Multi-objective reinforcement learning (MORL) algorithms aim to approximate the Pareto frontier uniformly in multiobjective decision making problems. In the scenario of deep reinforcement learning (RL), gradient-based methods are often adopted to learn deep policies/value functions due to the fast convergence speed, while pure gradient-based methods can not guarantee a uniformly approximated Pareto frontier. On the other side, evolution strategies straightly manipulate in the solution space to achieve a well-distributed Pareto frontier, but applying evolution strategies to optimize deep networks is still a challenging topic. To leverage the advantages of both kinds of methods, we propose a two-stage MORL framework combining a gradient-based method and an evolution strategy. First, an efficient multi-policy soft actor-critic algorithm is proposed to learn multiple policies collaboratively. The lower layers of all policy networks are shared. The first-stage learning can be regarded as representation learning. Secondly, the multi-objective covariance matrix adaptation evolution strategy (MO-CMA-ES) is applied to fine-tune policy-independent parameters to approach a dense and uniform estimation of the Pareto frontier. Experimental results on three benchmarks (Deep Sea Treasure, Adaptive Streaming, and Super Mario Bros) show the superiority of the proposed method. Keywords Multi-objective reinforcement learning · Multi-policy reinforcement learning · Pareto frontier · Sampling efficiency

1 Introduction Deep reinforcement learning (RL) algorithms have been successfully applied in many challenging decision making problems, such as video games [32, 33, 60], the game of Go [48, 50] and robotics [19, 20, 27, 49]. In these scenarios, only one objective is optimized. Nevertheless, many realworld decision making problems consider more than one objective. Network routing takes energy, latency, and channel capacity into account [35]. Medical treatment needs to release symptoms and minimize side effects [28, 29]. Economic systems are analyzed from both economic and ecological perspectives [57]. Furthermore, through adding extra reward signals that encode domain knowledge, reward shaping methods are adopted to encourage exploration of the agents [11, 36].

Diqi Chen

[email protected]

Extended author information available on the last page of the article.

The objectives of multi-objective reinforcement learning (MORL) are often conflicting, which means that maximizing one objective will normally lead to the minimization of the others [37]. Therefore, rather than learn only one optimal solution, MORL aims to achieve a set of mutually incomparable solutions. In this set, each solution is not dominated by all the other solutions. This non-dominated set is named as the Pareto frontier. And the non-dominated solution is

Data Loading...

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

Recommend Documents

An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm

A Dynamically Feasible Fast Replanning Strategy with Deep Reinforcement Learning

Application and comparison of different ensemble learning machines combining with a novel sampling strategy for shallow

Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

A Nonparametric Method for Combining Multilaboratory Data

Swimming strategy of settling elongated micro-swimmers by reinforcement learning

An Improved On-Policy Reinforcement Learning Algorithm

A Trust and Energy-Aware Double Deep Reinforcement Learning Scheduling Strategy for Federated Learning on IoT Devices

Shield Synthesis for Reinforcement Learning

Adaptive Representations for Reinforcement Learning

Deep reinforcement learning: a survey

Attribute-oriented cognitive concept learning strategy: a multi-level method