Learning Diverse Models: The Coulomb Structured Support Vector Machine
In structured prediction, it is standard procedure to discriminatively train a single model that is then used to make a single prediction for each input. This practice is simple but risky in many ways. For instance, models are often designed with tractabi
- PDF / 1,223,064 Bytes
- 15 Pages / 439.37 x 666.142 pts Page_size
- 2 Downloads / 241 Views
University of Heidelberg, IWR/HCI, 69120 Heidelberg, Germany {martin.schiegg,ferran.diego,fred.hamprecht}@iwr.uni-heidelberg.de 2 Robert Bosch GmbH, 70465 Stuttgart, Germany
Abstract. In structured prediction, it is standard procedure to discriminatively train a single model that is then used to make a single prediction for each input. This practice is simple but risky in many ways. For instance, models are often designed with tractability rather than faithfulness in mind. To hedge against such model misspecification, it may be useful to train multiple models that all are a reasonable fit to the training data, but at least one of which may hopefully make more valid predictions than the single model in standard procedure. We propose the Coulomb Structured SVM (CSSVM) as a means to obtain at training time a full ensemble of different models. At test time, these models can run in parallel and independently to make diverse predictions. We demonstrate on challenging tasks from computer vision that some of these diverse predictions have significantly lower task loss than that of a single model, and improve over state-of-the-art diversity encouraging approaches. Keywords: Structured output learning · Diverse predictions · Multiple output learning · Structured support vector machine
1
Introduction
The success of large margin methods for structured output learning, such as the structured support vector machine (SSVM) [1], is partly due to their good generalization performances achieved on test data, compared to, e.g. maximum likelihood learning on structured models [2]. Despite such regularization strategies, however, it is not guaranteed that the model which optimizes the learning objective function really generalizes well to unseen data. Reasons include wrong model assumptions, noisy data, ambiguities in the data, missing features, insufficient training data, or a task loss which is too complex to model directly. To further decrease the generalization error, it is beneficial to either (i) generate multiple likely solutions from the model [3–5] or, (ii) learn multiple models which generate diverse predictions [6–8]. The different predictions for a given structured input may then be analyzed to compute robustness/uncertainty measures, or may be the input for a more complex model exploiting higher-order c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 585–599, 2016. DOI: 10.1007/978-3-319-46487-9 36
586
M. Schiegg et al.
Fig. 1. Structured SVM learning. “+” indicates a structured training example whereas “−” in the same color are the corresponding structured outputs with task loss Δ(+, −) > 0. (a) A standard linear SSVM maximizes the margin between positive and all “negative” examples (decision boundary with its normal vector in cyan). (b) Multiple choice learning [6] learns M SSVMs (here: 3) which cluster the space (clusters for positive and negative examples are depicted in the same color) to generate M outputs. (c) We propose the Coulomb Structured SVM which learns
Data Loading...