Learning the Correlation Between Images and Disease Labels Using Ambiguous Learning

In this paper, we present a novel approach to candidate ground truth label generation for large-scale medical image collections by combining clinically-relevant textual and visual analysis through the framework of ambiguous label learning. In particular,

  • PDF / 540,682 Bytes
  • 9 Pages / 439.363 x 666.131 pts Page_size
  • 97 Downloads / 169 Views

DOWNLOAD

REPORT


Abstract. In this paper, we present a novel approach to candidate ground truth label generation for large-scale medical image collections by combining clinically-relevant textual and visual analysis through the framework of ambiguous label learning. In particular, we present a novel string matching algorithm for extracting disease labels from patient reports associated with imaging studies. These are assigned as ambiguous labels to the images of the study. Visual analysis is then performed on the images of the study and diagnostically relevant features are extracted from relevant regions within images. Finally, we learn the correlation between the ambiguous disease labels and visual features through an ambiguous SVM learning framework. The approach was validated in a large Doppler image collection of over 7000 images showing a scalable way to semi-automatically ground truth large image collections.

1

Introduction

With big data becoming relevant to medical imaging community, there is a growing need to more easily label these images for disease occurrences in a semiautomatic fashion to ease the labeling burden of clinical experts. Often medical imaging studies have reports associated with them that mention the disease labels. However, due to the large variety in spoken utterances, spotting disease labels using known medical vocabularies is difficult. Diseases may be implicitly mentioned, occur in negative sense, or be denoted through abbreviations or synonyms. Table 2 illustrates this problem through sample sentences taken from actual reports (Column 2) and their corresponding medical vocabulary phrases (Column 1). Associating disease labels of reports with relevant images in the study is also a difficult problem ordinarily requiring clinical expertise to spot the anomaly within the images. Figure 1 shows three images within the same patient study. An excerpt from the corresponding report is shown in Table 1. It is not clear from this data, what labels should be assigned to these images. The goal of this work is to address this problem by studying the correlation between anomaly depicting feature regions within images with potential disease labels through an ambiguous label learning formulation. Specifically, we extract disease depicting features from images and correlate with potentially ambiguous labels extracted from reports using a convex optimization learning formulation that minimizes a surrogate loss appropriate for the ambiguous labels. While the c Springer International Publishing Switzerland 2015  N. Navab et al. (Eds.): MICCAI 2015, Part II, LNCS 9350, pp. 185–193, 2015. DOI: 10.1007/978-3-319-24571-3_23

186

T. Syeda-Mahmood, R. Kumar and C. Compas

text-based disease label extraction method is generally applicable across modalities and specialties, disease-depicting visual features will need to be customized per modality and anatomical specialty. We illustrate this methodology, therefore, by restricting to images to cardiac Doppler ultrasound imaging. As shown in Figure 1, these images possess sufficient