Difference Target Propagation

Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment. This could become a serious issue as one considers deeper and more non-linear

PDF / 747,380 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
1 Downloads / 186 Views

DOWNLOAD

REPORT

Universit´e de Montr´eal, Montreal, QC, Canada 2 CIFAR Senior Fellow, Montreal, Canada [email protected]

Abstract. Back-propagation has been the workhorse of recent successes of deep learning but it relies on inﬁnitesimal eﬀects (partial derivatives) in order to perform credit assignment. This could become a serious issue as one considers deeper and more non-linear functions, e.g., consider the extreme case of non-linearity where the relation between parameters and cost is actually discrete. Inspired by the biological implausibility of back-propagation, a few approaches have been proposed in the past that could play a similar credit assignment role. In this spirit, we explore a novel approach to credit assignment in deep networks that we call target propagation. The main idea is to compute targets rather than gradients, at each layer. Like gradients, they are propagated backwards. In a way that is related but diﬀerent from previously proposed proxies for back-propagation which rely on a backwards network with symmetric weights, target propagation relies on auto-encoders at each layer. Unlike back-propagation, it can be applied even when units exchange stochastic bits rather than real numbers. We show that a linear correction for the imperfectness of the auto-encoders, called diﬀerence target propagation, is very eﬀective to make target propagation actually work, leading to results comparable to back-propagation for deep networks with discrete and continuous units and denoising auto-encoders and achieving state of the art for stochastic networks.

1

Introduction

Recently, deep neural networks have achieved great success in hard AI tasks [2,12,14,19], mostly relying on back-propagation as the main way of performing credit assignment over the diﬀerent sets of parameters associated with each layer of a deep net. Back-propagation exploits the chain rule of derivatives in order to convert a loss gradient on the activations over layer l (or time t, for recurrent nets) into a loss gradient on the activations over layer l − 1 (respectively, time t − 1). However, as we consider deeper networks– e.g., consider the recent best ImageNet competition entrants [20] with 19 or 22 layers – longer-term dependencies, or stronger non-linearities, the composition of many non-linear operations becomes more strongly non-linear. To make this concrete, consider the composition of many hyperbolic tangent units. In general, this means that derivatives obtained by back-propagation are becoming either very small (most of the time) or very large (in a few places). In the extreme (very deep computations), one would get discrete functions, whose derivatives are 0 almost everywhere, and c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 498–515, 2015. DOI: 10.1007/978-3-319-23528-8 31

Diﬀerence Target Propagation

499

inﬁnite where the function changes discretely. Clearly, back-propagation would fail in that regime. In addition, from the point of view of low-energy hard

Data Loading...

Difference Target Propagation

Recommend Documents

Difference

Improving access to services for psychotic patients: does implementing a waiting time target make a difference

Global dynamics of target-mediated drug disposition models and their solutions by nonstandard finite difference method

Modeling 3D acoustic-wave propagation using modified cuboid-based staggered-grid finite-difference methods with temporal

Group Difference

Disability/Difference

Visible Difference

Difference Algebra

Set-Difference

TARGET COSTING

Drug Target

Target Gene