Safe Exploration for Active Learning with Gaussian Processes

In this paper, the problem of safe exploration in the active learning context is considered. Safe exploration is especially important for data sampling from technical and industrial systems, e.g. combustion engines and gas turbines, where critical and uns

  • PDF / 588,966 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 61 Downloads / 172 Views

DOWNLOAD

REPORT


2

Robert Bosch GmbH, 70442 Stuttgart, Germany [email protected] University of Stuttgart, MLR Laboratory, 70569 Stuttgart, Germany

Abstract. In this paper, the problem of safe exploration in the active learning context is considered. Safe exploration is especially important for data sampling from technical and industrial systems, e.g. combustion engines and gas turbines, where critical and unsafe measurements need to be avoided. The objective is to learn data-based regression models from such technical systems using a limited budget of measured, i.e. labelled, points while ensuring that critical regions of the considered systems are avoided during measurements. We propose an approach for learning such models and exploring new data regions based on Gaussian processes (GP’s). In particular, we employ a problem specific GP classifier to identify safe and unsafe regions, while using a differential entropy criterion for exploring relevant data regions. A theoretical analysis is shown for the proposed algorithm, where we provide an upper bound for the probability of failure. To demonstrate the efficiency and robustness of our safe exploration scheme in the active learning setting, we test the approach on a policy exploration task for the inverse pendulum hold up problem.

1

Introduction

Active learning (AL) deals with the problem of selective and guided generation of labeled data. In the AL setting, an agent guides the data generation process by choosing new informative samples to be labeled based on the knowledge obtained so far. Providing labels for new data points, e.g. image labels as by Lang and Baum [1992] or measurements of the system output in case of physical systems, like by Hans et al. [2008], can be very costly and tedious. The overall goal of AL is to create a data-based model, without having to supply more data than necessary and, thus, reducing the agent annotation effort or the measurements on machines. For regression tasks, the AL concept is sometimes also referred to optimal experimental design, see Fedorov [1972]. In this paper, we consider the problem of safe data selection while jointly learning a data-based regression model on the explored input space. Given failure conditions, the goal is to actively select a budget of measurement points for approximating the model, and keeping the probability of measurement failures c Springer International Publishing Switzerland 2015  A. Bifet et al. (Eds.): ECML PKDD 2015, Part III, LNAI 9286, pp. 133–149, 2015. DOI: 10.1007/978-3-s319-23461-8 9

134

J. Schreiter et al.

to a minimum at the same time. In practice, safe data selection is highly relevant, especially, when measurements are performed on technical systems, e.g. combustion engines and test benches. For such technical systems, it is important to avoid critical points, where the measurements can damage the system. Thus, the main objective is (i) to approximate the system model from sampled data, (ii) using a limited budget of measured points, and (iii) ensuring that critical regions of the consi