Deep reinforcement learning for quadrotor path following with adaptive velocity

  • PDF / 1,863,448 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 48 Downloads / 251 Views

DOWNLOAD

REPORT


Deep reinforcement learning for quadrotor path following with adaptive velocity Bartomeu Rubí1

· Bernardo Morcego1 · Ramon Pérez1

Received: 12 March 2020 / Accepted: 14 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This paper proposes a solution for the path following problem of a quadrotor vehicle based on deep reinforcement learning theory. Three different approaches implementing the Deep Deterministic Policy Gradient algorithm are presented. Each approach emerges as an improved version of the preceding one. The first approach uses only instantaneous information of the path for solving the problem. The second approach includes a structure that allows the agent to anticipate to the curves. The third agent is capable to compute the optimal velocity according to the path’s shape. A training framework that combines the tensorflow-python environment with Gazebo-ROS using the RotorS simulator is built. The three agents are tested in RotorS and experimentally with the Asctec Hummingbird quadrotor. Experimental results prove the validity of the agents, which are able to achieve a generalized solution for the path following problem. Keywords Unmanned aerial vehicles · Trajectory control · Path following · Deep reinforcement learning · Deep deterministic policy gradient · Quadrotor

1 Introduction It is well known that unmanned aerial vehicles (UAV) are prepared to undertake a large number of applications in the upcoming future (eg., transportation, surveillance, mapping, exploration, search & rescue, maintenance, filming). It is for this reason that the research on these vehicles is constantly growing and keeps developing and implementing the most This work has been partially funded by the Spanish State Research Agency (AEI) and the European Regional Development Fund (ERDF) through the SCAV Project (Ref. MINECO DPI2017-88403-R), and by SMART Project (Ref. EFA 153/16 Interreg Cooperation Program POCTEFA 2014-2020). Bartomeu Rubí is also supported by the Secretaria d’Universitats i Recerca de la Generalitat de Catalunya, the European Social Fund (ESF) and AGAUR under a FI Grant (Ref. 2017FI_B_00212).

B

Bartomeu Rubí [email protected] Bernardo Morcego [email protected] Ramon Pérez [email protected]

1

Research Center for Supervision, Safety and Automatic Control (CS2AC), Universitat Politècnica de Catalunya (UPC), Rbla Sant Nebridi 22, Terrassa, Spain

innovative solutions of control theory, computer vision and artificial intelligence. To accomplish the final applications, the research on UAVs tackles several different problems which derive in diverse research fields, such as the stabilization control, trajectory control, obstacle detection and avoidance, path planning, mission control, fault tolerant control, formation control and many more. In the last few years the authors of this paper focused their effort on the path following problem, studying and developing the latest techniques to solve this problem. Path following (PF) is a control approach to solve the tr