An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm

  • PDF / 3,452,388 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 88 Downloads / 246 Views

DOWNLOAD

REPORT


An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm Wentao Luo 1

&

Jianfu Zhang 1,2 & Pingfa Feng 1,2,3 & Haochen Liu 4 & Dingwen Yu 1 & Zhijun Wu 1

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and highdimensional spaces of the assembly environment cause traditional RL methods to behave poorly in terms of their efficiency and accuracy. In this paper, a model-driven adaptive proximal proximity optimization (MAPPO) method was presented to make the assembly system autonomously rectify the bolt posture error. In the MAPPO method, a probabilistic tree and adaptive reward mechanism were used to improve the calculation efficiency and accuracy of the traditional PPO method. The size of the action space was reduced by establishing a hierarchical logical relationship for each parameter with a probabilistic tree. Based on an adaptive reward mechanism, the phenomenon that the algorithm easily falls into local minima could be improved. Finally, the proposed method was verified based on the Unity simulation engine. The advancement and robustness of the proposed model were also validated by comparing different cases in simulations and experiments. The results revealed that MAPPO has better algorithm efficiency and accuracy compared with other state-of-the-art algorithms. Keywords Model-driven method . Intelligent assembly . Probabilistic tree . Adaptive reward mechanism . Reinforcement learning . Physical simulation engine

1 Introduction With the advancements in machine learning theory, complex tasks can be autonomously executed in many essential fields, such as the automated detection of heart diseases or organ cancer in medical clinics [1–3], the interpersonal influence analysis in sociology [4] and quality inspection in engineering [5]. Recently, researchers have proposed machine learning theory in the continuous control task of machines to improve the intelligent ability of machines based on accumulated * Jianfu Zhang [email protected] 1

Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China

2

State Key Laboratory of Tribology, and Beijing Key Lab of Precision/Ultra-precision Manufacturing Equipment and Control, Tsinghua University, Beijing 100084, China

3

Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China

4

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China

learning experiences, which makes the era of human intelligence possible, especially in industry applications [6]. Gullapalli et al. [7] considered the impact of environmental uncertainty on robot control and established a nonlinear neural network structure to train a robot so that the robot could choose di