Comparing Incremental Learning Strategies for Convolutional Neural Networks
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly b
- PDF / 1,832,155 Bytes
- 10 Pages / 439.37 x 666.142 pts Page_size
- 52 Downloads / 246 Views
Abstract. In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications. Keywords: Deep learning networks
Incremental learning
Convolutional neural
1 Introduction In recent years, deep learning has established itself as a real game changer in many domains like speech recognition [1], natural language processing [2] and computer vision [3]. The real power of deep convolutional neural networks resides in their ability of directly processing high-dimensional raw data and produce meaningful high-level representations regardless of the specific task at hand [4]. So far, computer vision has been one of the most benefited fields and tasks like object recognition and object detection have seen significant advances in terms of accuracy (ref. to ImageNet LSVRC [5] and PascalVOC [6]). However, most of the recent successes are based on benchmarks providing an enormous quantity of data with a fixed training and test set. This is not always the case of real world applications where training data are often only partially available at the beginning and new data keep coming while the system is already deployed and working, like for recommendation systems and anomaly detection where learning sudden behavioral changes becomes quite critical. One possible approach to deal with this incremental scenario is to store all the previously seen past data, and retrain the model from scratch as soon as a new batch of data is available (in the following we refer to this approach as cumulative approach). However, this solution is often impractical for many real world systems where memory and computational resources are subject to stiff constraints. As a matter of fact, employing convolutional neural networks is expensive and training them on large datasets could take days or weeks, even on modern GPUs [3]. A different approach to address this issue, is to update the model based only on the new available batch of data. This is computationally © Springer International Publishing AG 2016 F. Schwenker et al. (Eds.): ANNPR 2016, LNAI 9896, pp. 175–184, 2016. DOI: 10.1007/978-3-319-46182-3_15
176
V. Lomonaco and D. Maltoni
cheaper and storing past data is not necessary. Of course, nothing comes for free, and this approach may lead to a substantial loss of accuracy with respect to the cumulative strategy. Moreover, the stability-plasticity dilemma may arise and dangerous shifts of the model are always possible because of catastrophic forgetting [7–10]. Forgetting previously learned patterns ca
Data Loading...