Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mini

  • PDF / 1,684,008 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 20 Downloads / 169 Views

DOWNLOAD

REPORT


(2020) 20:208

RESEARCH ARTICLE

Open Access

Identification of most influential cooccurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization Charles C. N. Wang1,2, Jennifer Jin3, Jan-Gowth Chang4,5,6, Masahiro Hayakawa7, Atsushi Kitazawa7, Jeffrey J. P. Tsai1 and Phillip C.-Y. Sheu3*

Abstract Background: Gastrointestinal (GI) cancer including colorectal cancer, gastric cancer, pancreatic cancer, etc., are among the most frequent malignancies diagnosed annually and represent a major public health problem worldwide. Methods: This paper reports an aided curation pipeline to identify potential influential genes for gastrointestinal cancer. The curation pipeline integrates biomedical literature to identify named entities by Bi-LSTM-CNN-CRF methods. The entities and their associations can be used to construct a graph, and from which we can compute the sets of cooccurring genes that are the most influential based on an influence maximization algorithm. Results: The sets of co-occurring genes that are the most influential that we discover include RARA - CRBP1, CASP3 BCL2, BCL2 - CASP3 – CRBP1, RARA - CASP3 – CRBP1, FOXJ1 - RASSF3 - ESR1, FOXJ1 - RASSF1A - ESR1, FOXJ1 - RASS F1A - TNFAIP8 - ESR1. With TCGA and functional and pathway enrichment analysis, we prove the proposed approach works well in the context of gastrointestinal cancer. Conclusions: Our pipeline that uses text mining to identify objects and relationships to construct a graph and uses graph-based influence maximization to discover the most influential co-occurring genes presents a viable direction to assist knowledge discovery for clinical applications. Keywords: Gastrointestinal cancer, Text mining, Bi-LSTM-CNN-CRF, Influence maximization, Co-occurrence network

Introduction Gastrointestinal (GI) cancer is the most common human tumors encountered worldwide [1]. These include colorectal cancer, gastric cancer, pancreatic cancer, and cancer of the liver and of the biliary tract. Although early-stage GI cancers are amenable to surgical resection with curative intent, the overall 5-year relapse rate remains high. The addition of neoadjuvant or adjuvant chemotherapy and radiation therapy only modestly * Correspondence: [email protected] 3 Department of EECS and BME, University of California, Irvine, USA Full list of author information is available at the end of the article

improves the overall long-term survival [2]. Approximately 25% of GI cancers are diagnosed in an advanced stage, whereas another 25 to 50% of patients will develop metastases during the course of the disease [3]. GI cancers are still a leading cause of cancer death [4]. Therefore, it is imperative to explore potential effective influential genes to increase the number of patients qualified for curative treatments. The increase in biomedical articles and the formation of various biomolecule interaction databases enable us to obtain diverse biological networks. These biological networks provide a wealth of raw materials for further