On tower and checkerboard neural network architectures for gene expression inference

  • PDF / 1,707,891 Bytes
  • 11 Pages / 595 x 791 pts Page_size
  • 109 Downloads / 150 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

On tower and checkerboard neural network architectures for gene expression inference Vladimír Kunc*

and Jiˇrí Kléma

From 15th International Symposium on Bioinformatics Research and Applications (ISBRA ’19) Barcelona, Spain. 3-6 June 2019

Abstract Background: One possible approach how to economically facilitate gene expression profiling is to use the L1000 platform which measures the expression of ∼ 1, 000 landmark genes and uses a computational method to infer the expression of another ∼ 10, 000 genes. One such method for the gene expression inference is a D–GEX which employs neural networks. Results: We propose two novel D–GEX architectures that significantly improve the quality of the inference by increasing the capacity of a network without any increase in the number of trained parameters. The architectures partition the network into individual towers. Our best proposed architecture — a checkerboard architecture with a skip connection and five towers — together with minor changes in the training protocol improves the average mean absolute error of the inference from 0.134 to 0.128. Conclusions: Our proposed approach increases the gene expression inference accuracy without increasing the number of weights of the model and thus without increasing the memory footprint of the model that is limiting its usage. Keywords: Neural network, Tower architecture, Gene expression, Checkerboard architecture

Background Determining gene expression is valuable for various medical and biological researches (e.g., [1–5],); however in spite of significant price drop in the last decade, gene expression profiling is still expensive for large scale experiments. One of the approaches lowering the costs and allowing larger-scale experiments is represented by the LINCS1 program which developed the L1000 platform based on Luminex bead technology. The L1000 platform measures 1

http://www.lincsproject.org/

*Correspondence: [email protected] Department of Computer Science, Karlovo námˇestí 13, 121 35 Prague, Czech Republic

the expression profile of ∼1, 000 carefully selected landmark genes and then reconstructs the full gene profile of ∼10, 000 target genes [6] which is much cheaper than measuring the full expression profile directly. The inference of the full gene expression profile from the expression of the landmark genes was originally based on linear regression and then improved by a deep learning approach called D– GEX [7]. The original D–GEX is a pair of two artificial neural networks (NNs) that are able to, in contrast to the linear regression, reconstruct the non-linear patterns. Inferring the full profile is a large-scale machine learning task that is computationally challenging as the target dimension is much higher than the input dimension [7]. A novel model based on the original D–GEX

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format