Statistical Analysis of Functional Genes in Human PPI Networks
In this chapter, based on the up-to-date data from various databases or literature, two large-scale human protein interaction networks and six functional subnetworks have been constructed. The six functional subnetworks consist of essential genes, viable
- PDF / 1,413,642 Bytes
- 30 Pages / 439.36 x 666.15 pts Page_size
- 71 Downloads / 220 Views
Statistical Analysis of Functional Genes in Human PPI Networks
Abstract In this chapter, based on the up-to-date data from various databases or literature, two large-scale human protein interaction networks and six functional subnetworks have been constructed. The six functional subnetworks consist of essential genes, viable genes, disease genes, conserved genes, housekeeping genes, and tissue-enriched genes, respectively. We illustrate that the human protein interaction networks and most of the subnetworks are sparse, small-world, scalefree, disassortative, and with hierarchical modular structures. The essential, the disease and the housekeeping subnetworks are more densely connected than the others. Statistical analysis reveals that the lethal genes, the conserved genes, the housekeeping genes, and the tissue-enriched genes are with hallmark topological features. Receiver operating characteristic curves indicate that the essential genes can be distinguished from the viable ones with accuracy as high as almost 70%. Closeness, semi-local and eigenvector centralities can distinguish the housekeeping genes from the tissue-enriched ones with accuracy around 82%. Furthermore, statistical analysis of disease genes reveals that some classes of disease genes are with hallmark topological features, especially for the cancer genes, the housekeeping disease genes, and the tissue-enriched disease genes. The findings facilitate the identification of some functional genes via their topological structures in protein interaction networks.
8.1 Backgrounds With the development of high-throughput technologies, such as the Y2H and the mass spectrometry technique, various interactome resources for species ranging from model organisms to human have been available [1]. Data of protein interactions are especially rich in amount. Many databases have been established to provide the binary protein interactions data for various organisms, such as the online predicted human interaction database (OPHID) [2], the human protein reference database (HPRD) [3], the biological general repository for interaction datasets (BioGRID) [4], the Münich information center for protein sequence (MIPS) [5], the bio© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 J. Lü, P. Wang, Modeling and Analysis of Bio-molecular Networks, https://doi.org/10.1007/978-981-15-9144-0_8
397
398
8 Statistical Analysis of Functional Genes in Human PPI Networks
molecular interaction network database (BIND) [6], the database of interacting proteins (DIP) [7], the molecular interaction database (MINT) [8], and the protein interaction database (IntAct) [9]. Numerous researches on bio-molecular networks focused on the E. coli and yeast S. cerevisiae [10–14], which cover hundreds or thousands of nodes. It is estimated that the complete human protein interactome contains about 25,000 protein-coding genes and more than 375,000 interactions among them [15–17]. In 2005, based on the Y2H technology and literature cu
Data Loading...