Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis

  • PDF / 1,271,420 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 15 Downloads / 200 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis Hengzhe Zhang1 · Aimin Zhou1

· Xin Lin1

Received: 13 April 2020 / Accepted: 4 July 2020 © The Author(s) 2020

Abstract Reinforcement learning based on the deep neural network has attracted much attention and has been widely used in real-world applications. However, the black-box property limits its usage from applying in high-stake areas, such as manufacture and healthcare. To deal with this problem, some researchers resort to the interpretable control policy generation algorithm. The basic idea is to use an interpretable model, such as tree-based genetic programming, to extract policy from other black box modes, such as neural networks. Following this idea, in this paper, we try yet another form of the genetic programming technique, evolutionary feature synthesis, to extract control policy from the neural network. We also propose an evolutionary method to optimize the operator set of the control policy for each specific problem automatically. Moreover, a policy simplification strategy is also introduced. We conduct experiments on four reinforcement learning environments. The experiment results reveal that evolutionary feature synthesis can achieve better performance than tree-based genetic programming to extract policy from the neural network with comparable interpretability. Keywords Reinforcement learning · Genetic programming · Policy derivation · Explainable machine learning

Introduction Reinforcement learning [31] has shown its extraordinary performance in computer games [22] and other real-world applications [29]. The neural network is widely used as a dominant model to solve reinforcement learning problems. Generally, we call these methods deep reinforcement learning algorithms, since these algorithms use a deep neural network as the value function approximator or the policy function approximator. Deep q-learning (DQN) [22], double DQN [9], dueling DQN (DDQN) [36] are prestigious algorithms that train a deep neural network for reinforcement learning problems. However, the black-box property of the deep neural network prevents DNN to be directly used in the high-stake scenarios [33]. Therefore, building an interpretable model is essential, and even more priority than interpreting the black-box model in the current machine learning field [27].

B 1

Aimin Zhou [email protected] Shanghai Key Laboratory of Multidimensional information Processing, School of Computer Science and Technology, East China Normal University, Shanghai, China

There are a variety of ways to build interpretable models [15,34]. Among them, the genetic programming (GP), which builds a symbolic expression as an explainable model through the genetic algorithm, is a promising one. Recently, GP has been applied to reinforcement learning. The idea is to evolve an explainable model to extract the policy from the deep neural network. In [11], an explainable reinforcement learning policy model is built by using the tree-based