An embedded gene selection method using knockoffs optimizing neural network

  • PDF / 1,448,314 Bytes
  • 19 Pages / 595.276 x 793.701 pts Page_size
  • 53 Downloads / 201 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Open Access

An embedded gene selection method using knockoffs optimizing neural network Juncheng Guo1,2,3, Min Jin4, Yuanyuan Chen4 and Jianxiao Liu1,4* * Correspondence: liujianxiao321@ 163.com 1 Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China 4 National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China Full list of author information is available at the end of the article

Abstract Background: Gene selection refers to find a small subset of discriminant genes from the gene expression profiles. How to select genes that affect specific phenotypic traits effectively is an important research work in the field of biology. The neural network has better fitting ability when dealing with nonlinear data, and it can capture features automatically and flexibly. In this work, we propose an embedded gene selection method using neural network. The important genes can be obtained by calculating the weight coefficient after the training is completed. In order to solve the problem of black box of neural network and further make the training results interpretable in neural network, we use the idea of knockoffs to construct the knockoff feature genes of the original feature genes. This method not only make each feature gene to compete with each other, but also make each feature gene compete with its knockoff feature gene. This approach can help to select the key genes that affect the decision-making of neural networks. Results: We use maize carotenoids, tocopherol methyltransferase, raffinose family oligosaccharides and human breast cancer dataset to do verification and analysis. Conclusions: The experiment results demonstrate that the knockoffs optimizing neural network method has better detection effect than the other existing algorithms, and specially for processing the nonlinear gene expression and phenotype data. Keywords: Gene mining, Neural network, Knockoffs, Nonlinear data, Maize

Introduction In recent years, large amounts of biological data (such as genomes, transcriptomes, and phenotypes) have been generated with the maturity and rapid development of many high-throughput technologies. In this context, it’s possible to mine gene loci for specific phenotypic traits (such as crop vitamin A content, agronomic traits, human diseases, etc) from the genome-wide data. In recent years, Genome-Wide Association Study (GWAS) and linkage analysis have become important ways of gene location and fine allele discovery. At present, a lot of quantitative trait loci controlling various phenotypic traits have been mapped by biologists using these methods. However, linkage analysis method needs to construct segregated population, longer cycle and low © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropria