Biomolecular Clusters Identification in Linear Time Complexity for Biological Networks

Identification of biomolecular clusters from biological networks based on structures is a critical task because the existing algorithmic approaches require high computation and also not feasible for complex, large networks. Majority of these clustering te

  • PDF / 292,848 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 99 Downloads / 236 Views

DOWNLOAD

REPORT


Abstract Identification of biomolecular clusters from biological networks based on structures is a critical task because the existing algorithmic approaches require high computation and also not feasible for complex, large networks. Majority of these clustering techniques are real-time such as Louvain model which is considered as the fastest algorithm, utilized modularity maximization process for clusters or communities identification. Here we explained a faster, accurate and efficient algorithmic approach for biomolecular clusters identification considering the low running time, as well as better cluster quality using network (graph) based traversal techniques. We also justified that this algorithm works on linear time complexity in order to generate firstly the initial cover and final cover after modularity maximization. Keywords Biomolecular clusters · Biological networks · Linear running time complexity · Network traversal · Threshold · Modularity maximization

S. Debnath Tata Consultancy Services Limited, Kolkata, India e-mail: [email protected] S. Rakshit · K. Sengupta (B) · D. Plewczynski (B) Laboratory of Functional and Structural Genomics, Centre of New Technologies, Warsaw, Poland e-mail: [email protected] D. Plewczynski e-mail: [email protected] S. Rakshit e-mail: [email protected] S. Rakshit School of Information, The University of Texas at Austin, Austin, TX, USA D. Plewczynski Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland K. Sengupta Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland © Springer Nature Singapore Pte Ltd. 2021 D. Bhattacharjee et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Advances in Intelligent Systems and Computing 1255, https://doi.org/10.1007/978-981-15-7834-2_57

611

612

S. Debnath et al.

1 Introduction Biological Networks can be visualized as three dimensional graphical structure and defined as N = (M, C), where M represents the set of all biomolecules and C represents the list of connections among them. Here, m = |M| and c = |C| are the number of molecules and connections, respectively. These networks have some implicit features such as molecular clusters which are groups of molecules with dense intraconnections and sparse interconnections. These clusters are connected via some common molecules, termed as Outline-Cluster Molecules which mark the boundaries among clusters. So finding such outline molecules help to reach new clusters with identifying the molecules inside those clusters, termed as Cluster Molecules. In this algorithm, firstly during the network traversal, all the molecules are classified on the basis of locating inside the clusters or outline of the clusters and then the outline molecules are assigned to appropriate clusters based on the location of most of their neighbors measuring the probability of belonging. In this process, the outline molecules of same clusters having higher Belonging Proba