Why Does Synthesized Data Improve Multi-sequence Classification?
The classification and registration of incomplete multi-modal medical images, such as multi-sequence MRI with missing sequences, can sometimes be improved by replacing the missing modalities with synthetic data. This may seem counter-intuitive: synthetic
- PDF / 156,090 Bytes
- 8 Pages / 439.363 x 666.131 pts Page_size
- 7 Downloads / 201 Views
Biomedical Imaging Group Rotterdam Erasmus MC University Medical Center, The Netherlands 2 Department of Computer Science University of Copenhagen, Denmark Abstract. The classification and registration of incomplete multi-modal medical images, such as multi-sequence MRI with missing sequences, can sometimes be improved by replacing the missing modalities with synthetic data. This may seem counter-intuitive: synthetic data is derived from data that is already available, so it does not add new information. Why can it still improve performance? In this paper we discuss possible explanations. If the synthesis model is more flexible than the classifier, the synthesis model can provide features that the classifier could not have extracted from the original data. In addition, using synthetic information to complete incomplete samples increases the size of the training set. We present experiments with two classifiers, linear support vector machines (SVMs) and random forests, together with two synthesis methods that can replace missing data in an image classification problem: neural networks and restricted Boltzmann machines (RBMs). We used data from the BRATS 2013 brain tumor segmentation challenge, which includes multi-modal MRI scans with T1, T1 post-contrast, T2 and FLAIR sequences. The linear SVMs appear to benefit from the complex transformations offered by the synthesis models, whereas the random forests mostly benefit from having more training data. Training on the hidden representation from the RBM brought the accuracy of the linear SVMs close to that of random forests.
1
Introduction
Multi-sequence data can be very informative in medical imaging, but using it may cause some practical problems. Training a classifier on multi-modal data, for instance, generally requires that all modalities are available for all samples. If some modalities are missing, there is a range of methods for handling or imputing the missing values in standard statistical analysis [1]. Specifically for image analysis, there are synthesis methods that predict missing modalities. Some methods model the physical properties of the imaging process, e.g., to derive intrinsic tissue parameters from MRI scans [2] or to derive pseudo-CT from MRI in radiotherapy applications [3,4]. But an explicit model of the imaging process is not even required, as image processing techniques can be sufficient: for example, pseudo-CT images have also been made with tissue segmentation [5,6], with Gaussian mixture models [7] or by registering and combining CT images [8,9]. c Springer International Publishing Switzerland 2015 N. Navab et al. (Eds.): MICCAI 2015, Part I, LNCS 9349, pp. 531–538, 2015. DOI: 10.1007/978-3-319-24553-9_65
532
G. van Tulder and M. de Bruijne
Interestingly, data synthesis can not only generate images but also helps as an intermediate step. For example, Iglesias et al. [10] found that synthetic data improved the registration of multi-sequence brain MRI. Roy et al. [11] showed that synthetic sequences can improve segmentation consistency in datasets
Data Loading...