Feature selection algorithm based on dual correlation filters for cancer-associated somatic variants
- PDF / 2,007,434 Bytes
- 19 Pages / 595.276 x 790.866 pts Page_size
- 15 Downloads / 207 Views
ETHODOLOGY ARTICLE
Open Access
Feature selection algorithm based on dual correlation filters for cancer‑associated somatic variants Hyein Seo and Dong‑Ho Cho*
*Correspondence: [email protected] School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak‑ro, Yuseong‑gu, 34141 Daejeon, Republic of Korea
Abstract Background: Since the development of sequencing technology, an enormous amount of genetic information has been generated, and human cancer analysis using this information is drawing attention. As the effects of variants on human cancer become known, it is important to find cancer-associated variants among countless variants. Results: We propose a new filter-based feature selection method applicable for extracting cancer-associated somatic variants considering correlations of data. Both variants associated with the activation and deactivation of cancer’s characteristics are analyzed using dual correlation filters. The multiobjective optimization is utilized to consider two types of variants simultaneously without redundancy. To overcome high computational complexity problem, we calculate the correlation-based weight to select significant variants instead of directly searching for the optimal subset of variants. The proposed algorithm is applied to the identification of melanoma metas‑ tasis or breast cancer stage, and the classification results of the proposed method are compared with those of conventional single correlation filter-based method. Conclusions: We verified that the proposed dual correlation filter-based method can extract cancer-associated variants related to the characteristics of human cancer. Keywords: Somatic variant, Cancer-associated variant, Feature selection, Correlation filter, Multiobjective optimization
Background The development of next-generation sequencing (NGS), which performs high-throughput parallel sequencing of short DNA fragments, has greatly facilitated the analysis of genetic information [1]. NGS has made an important contribution to cancer research, including the understanding of cancer initiation, progression, and treatment [2–4]. Single nucleotide variant (SNV) and short insertion or deletion (InDel) are changes in genetic information of very small length and occur with very low frequencies. Variant calling algorithms for these variants, especially in the somatic cell, have been developed
© The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate‑ rial. If material is not included in the article’s Creative Commons licence and your inte
Data Loading...