Diagnosis Approaches for Colorectal Cancer Using Manifold Learning and Deep Learning

  • PDF / 2,623,616 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 85 Downloads / 209 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Diagnosis Approaches for Colorectal Cancer Using Manifold Learning and Deep Learning Nguyen Thanh‑Hai1   · Nguyen Thai‑Nghe1 Received: 11 April 2020 / Accepted: 8 August 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract Data visualization is still a challenge for numerous fields. For metagenomic data, datasets are usually characterized by very high-dimensional data which are hard to interpret to humans. Among diseases using metagenomic data for prediction, deep learning usually yields a lower performance comparing to classical machine learning for colorectal cancer prediction. In this paper, we present an approach using manifold learning with t-distributed stochastic neighbor embedding (t-SNE) and spectral embedding to visualize numerical data into images and leverage deep learning algorithms to improve the performance in colorectal cancer diseases prediction. The work also provides promising potentials to improve the visualization quality and performance in prediction tasks on dense data. The analytical results of samples coming from five various regions including America, China, Austria, Germany, and France show promising in use of combination between these visualization approaches and deep learning to enhance the performance in colorectal cancer disease diagnosis. Keywords  Colorectal cancer disease prediction · Visualization · Deep learning · Manifold learning · Spectral embedding · T-Distributed stochastic neighbor embedding

Introduction Recent years, numerous studies have been investigating some approaches to apply metagenomic for personalized medicine. Metagenomics is the study which takes into account numerous genome at the same time. Metagenomic samples can be collected from various environments, for example, bacterial in the human gut. Known bacterial species in the human gut are estimated from 500 to over 1000 [1]. Many of them can be the causes of some diseases. However, the investigation for diseases is a big challenge because of inconsistent results on disease prediction and the complexity of the diseases. Due to high-dimensional data of metagenomics, researchers have usually faced difficulties to This article is part of the topical collection “Software Technology and Its Enabling Computing Platforms” guest edited by Lam-Son Lê and Michel Toulouse. * Nguyen Thanh‑Hai [email protected] Nguyen Thai‑Nghe [email protected] 1



Can Tho University, Can Tho, Vietnam

understand the data. Numerous research in data visualization to illustrate features in 2D aim to interpret data and find patterns in data. The visualization methods are proposed to improve users’ easy tools for exploring data. Those tools for visualizing data can be charts provided by Microsoft Office software which can be used easily by many non-programmers. Some kinds of methods are commonly used such as pie chart and bar charts. The pie chart was implemented in [2] and considered as one of the best-known charts to reveal data composition. Another useful kind of charts is the bar chart. It helps to exhibit a