Deep reinforcement learning for quadrotor path following with adaptive velocity

PDF / 1,863,448 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
48 Downloads / 285 Views

Deep reinforcement learning for quadrotor path following with adaptive velocity Bartomeu Rubí1

· Bernardo Morcego1 · Ramon Pérez1

Received: 12 March 2020 / Accepted: 14 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This paper proposes a solution for the path following problem of a quadrotor vehicle based on deep reinforcement learning theory. Three different approaches implementing the Deep Deterministic Policy Gradient algorithm are presented. Each approach emerges as an improved version of the preceding one. The first approach uses only instantaneous information of the path for solving the problem. The second approach includes a structure that allows the agent to anticipate to the curves. The third agent is capable to compute the optimal velocity according to the path’s shape. A training framework that combines the tensorflow-python environment with Gazebo-ROS using the RotorS simulator is built. The three agents are tested in RotorS and experimentally with the Asctec Hummingbird quadrotor. Experimental results prove the validity of the agents, which are able to achieve a generalized solution for the path following problem. Keywords Unmanned aerial vehicles · Trajectory control · Path following · Deep reinforcement learning · Deep deterministic policy gradient · Quadrotor

1 Introduction It is well known that unmanned aerial vehicles (UAV) are prepared to undertake a large number of applications in the upcoming future (eg., transportation, surveillance, mapping, exploration, search & rescue, maintenance, filming). It is for this reason that the research on these vehicles is constantly growing and keeps developing and implementing the most This work has been partially funded by the Spanish State Research Agency (AEI) and the European Regional Development Fund (ERDF) through the SCAV Project (Ref. MINECO DPI2017-88403-R), and by SMART Project (Ref. EFA 153/16 Interreg Cooperation Program POCTEFA 2014-2020). Bartomeu Rubí is also supported by the Secretaria d’Universitats i Recerca de la Generalitat de Catalunya, the European Social Fund (ESF) and AGAUR under a FI Grant (Ref. 2017FI_B_00212).

B

Bartomeu Rubí [email protected] Bernardo Morcego [email protected] Ramon Pérez [email protected]

1

Research Center for Supervision, Safety and Automatic Control (CS2AC), Universitat Politècnica de Catalunya (UPC), Rbla Sant Nebridi 22, Terrassa, Spain

innovative solutions of control theory, computer vision and artificial intelligence. To accomplish the final applications, the research on UAVs tackles several different problems which derive in diverse research fields, such as the stabilization control, trajectory control, obstacle detection and avoidance, path planning, mission control, fault tolerant control, formation control and many more. In the last few years the authors of this paper focused their effort on the path following problem, studying and developing the latest techniques to solve this problem. Path following (PF) is a control approach to solve the tr

Data Loading...

Deep reinforcement learning for quadrotor path following with adaptive velocity

Recommend Documents

Adaptive Representations for Reinforcement Learning

Adversarial Deep Reinforcement Learning Based Adaptive Moving Target Defense

Deep Reinforcement Learning with Temporal Logics

Deep Reinforcement Learning with Temporal-Awareness Network

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Deep Reinforcement Learning for Wireless Networks

Deep Reinforcement Learning for Foreign Exchange Trading

Deep reinforcement learning: a survey

Path Following Control of Quadrotor UAV With Continuous Fractional-Order Super Twisting Sliding Mode

Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services

Deep Reinforcement Learning with Guaranteed Performance A Lyapunov-B