Deep Learning Pre-training Strategy for Mammogram Image Classification: an Evaluation Study

  • PDF / 1,310,321 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 63 Downloads / 166 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Deep Learning Pre-training Strategy for Mammogram Image Classification: an Evaluation Study Kadie Clancy 1

&

Sarah Aboutalib 2 & Aly Mohamed 3 & Jules Sumkin 3 & Shandong Wu 2,3,4,5

# Society for Imaging Informatics in Medicine 2020

Abstract In this work, we assess how pre-training strategy affects deep learning performance for the task of distinguishing false-recall from malignancy and normal (benign) findings in digital mammography images. A cohort of 1303 breast cancer screening patients (4935 digital mammogram images in total) was retrospectively analyzed as the target dataset for this study. We assessed six different convolutional neural network model structures utilizing four different imaging datasets (total > 1.4 million images (including ImageNet); medical images different in terms of scale, modality, organ, and source) for pre-training on six classification tasks to assess how the performance of CNN models varies based on training strategy. Representative pre-training strategies included transfer learning with medical and non-medical datasets, layer freezing, varied network structure, and multi-view input for both binary and triple-class classification of mammogram images. The area under the receiver operating characteristic curve (AUC) was used as the model performance metric. The best performing model out of all experimental settings was an AlexNet model incrementally pre-trained on ImageNet and a large Breast Density dataset. The AUC for the six classification tasks using this model ranged from 0.68 to 0.77. In the case of distinguishing recalled-benign mammograms from others, four out of five pre-training strategies tested produced significant performance differences from the baseline model. This study suggests that pre-training strategy influences significant performance differences, especially in the case of distinguishing recalled- benign from malignant and benign screening patients. Keywords Breast cancer . Digital mammography . Deep learning . Transfer learning . Training strategy

Background Digital mammography is the primary clinical imaging exam for early-stage breast cancer screening in the general population [1]. The effectiveness of digital mammography screening for early detection and mortality reduction is well known, but * Shandong Wu [email protected] 1

Department of Computer Science, University of Pittsburgh, 3240 Craft Place, Pittsburgh, PA 15213, USA

2

Department of Biomedical Informatics, University of Pittsburgh, 3240 Craft Place, Pittsburgh, PA 15213, USA

3

Department of Radiology, University of Pittsburgh, 3240 Craft Place, Rm. 322, Pittsburgh, PA 15213, USA

4

Department of Bioengineering, University of Pittsburgh, 3240 Craft Place, Pittsburgh, PA 15213, USA

5

Intelligent Systems Program, University of Pittsburgh, 3240 Craft Place, Pittsburgh, PA 15213, USA

as with any form of medical imaging, it is an imperfect modality. Serious challenges still exist in the distinction of benign from malignant lesions and in the reduction of false-recall [2]. False-rec