Learning Without Forgetting

When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeas

  • PDF / 740,341 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 44 Downloads / 198 Views

DOWNLOAD

REPORT


Abstract. When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning as standard practice for improved new task performance. Keywords: Convolutional neural networks · Transfer learning task learning · Deep learning · Visual recognition

1

· Multi-

Introduction

Many practical vision applications require learning new visual capabilities while maintaining performance on existing ones. For example, a robot may be delivered to someone’s house with a set of default object recognition capabilities, but new site-specific object models need to be added. Or for construction safety, a system can identify whether a worker is wearing a safety vest or hard hat, but a superintendent may wish to add the ability to detect improper footware. Ideally, the new tasks could be learned while sharing parameters from old ones, without degrading performance on old tasks or having access to the old training data. Legacy data may be unrecorded, proprietary, or simply too cumbersome to use in training a new task. Though similar in spirit to transfer, multitask, and lifelong learning, we are not aware of any work that provides a solution to the problem of continually adding new prediction tasks based on adapting shared parameters without access to training data for previously learned tasks. In this paper, we demonstrate a simple but effective solution on a variety of image classification problems with Convolutional Neural Network (CNN) classifiers. In our setting, a CNN has a set of shared parameters θs (e.g., five convolutional layers and two fully connected layers for AlexNet [11] architecture), c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 614–629, 2016. DOI: 10.1007/978-3-319-46493-0 37

Learning Without Forgetting

615

Duplicating and Feature Joint Learning without Fine Tuning Fine Tuning Extraction Training Forgetting new task performance good original task performance X bad training efficiency fast testing efficiency fast storage requirement medium requires previous task data no

good good fast X slow X large no

X medium good fast fast medium no

best good X slow fast X large X yes

best good fast fast medium no

Fig. 1. We wish to add new prediction tasks to an existing CNN vi