Bayesian neural networks at scale: a performance analysis and pruning study

  • PDF / 3,098,074 Bytes
  • 29 Pages / 439.37 x 666.142 pts Page_size
  • 10 Downloads / 128 Views

DOWNLOAD

REPORT


Bayesian neural networks at scale: a performance analysis and pruning study Himanshu Sharma1   · Elise Jennings1

© This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020

Abstract Bayesian neural networks (BNNs) are a promising method of obtaining statistical uncertainties for neural network predictions but with a higher computational overhead which can limit their practical usage. This work explores the use of high-performance computing with distributed training to address the challenges of training BNNs at scale. We present a performance and scalability comparison of training the VGG-16 and Resnet-18 models on a Cray-XC40 cluster. We demonstrate that network pruning can speed up inference without accuracy loss and provide an opensource software package, BPrune, to automate this pruning. For certain models we find that pruning up to 80% of the network results in only a 7.0% loss in accuracy. With the development of new hardware accelerators for deep learning, BNNs are of considerable interest for benchmarking performance. This analysis of training a BNN at scale outlines the limitations and benefits compared to a conventional neural network. Keywords  Bayesian neural networks (BNN) · Distributed training · Model uncertainty · Pruning BNNs

1 Introduction One important challenge for machine and deep learning (DL) practitioners is to develop a robust and accurate understanding of the model uncertainty. The current state-of-the-art deep learning networks are now able to learn representations in complex high-dimensional data for doing context-informed predictions. However, these predictions are often taken blindly with the provided accuracy metric, which may be * Himanshu Sharma [email protected] Elise Jennings [email protected] 1



Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, IL, USA

13

Vol.:(0123456789)



H. Sharma, E. Jennings

erroneous. Further, for scientific applications of machine learning such as in physics, biology and manufacturing, including accurate model uncertainties, is crucial. Conventional deep neural networks (DNNs) are deterministic models. These models do not provide uncertainty quantification (UQ), model confidence or a probabilistic framework for model comparison. Typically, a probabilistic model is used to compute these quantities of interest. In a deep learning context, DNNs can be integrated with probabilistic models such as Gaussian processes, which induce probability distribution over functions. A Gaussian process can be recovered from these networks in the limit of an infinite number of weights associated with probabilistic distributions (see [1, 2]). In a finite setting, a Bayesian neural network (BNN) is a DNN with probability distributions instead of point estimates for each weight. Several foundational works on this topic such as Mackay [3] and Neal [1] have lead to BNNs gaining in popularity among DL practitioners. In theory these networks can overcome many limitati