DNA sequence classification based on MLP with PILAE algorithm

  • PDF / 963,743 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 76 Downloads / 185 Views

DOWNLOAD

REPORT


METHODOLOGIES AND APPLICATION

DNA sequence classification based on MLP with PILAE algorithm Mohammed A. B. Mahmoud1

· Ping Guo1,2

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In the bioinformatics field, the classification of unknown biological sequences is a key task that is fundamental for simplifying the consistency, aggregation, and survey of organisms and their evolution. We can view biological sequences as data components of higher non-fixed dimensions, corresponding to the length of the sequences. Numerical encoding performs an important function in DNA sequence evaluation via computational procedures such as one-hot encoding (OHE). However, the OHE method has drawbacks: 1) it does not add any details that may produce the additional predictive variable, and 2) if the variable has many classes, then OHE increases the feature space significantly. To overcome these drawbacks, this paper presents a computationally effective framework for classifying DNA sequences of living organisms in the image domain. The proposed strategy relies upon multilayer perceptron trained by a pseudoinverse learning autoencoder (PILAE) algorithm. The PILAE training process does not have to set the learning control parameters or indicate the number of hidden layers. Therefore, the PILAE classifier can accomplish better performance contrasting with other deep neural network (DNNs) strategies such as VGG-16 and Xception models. Experimental results have demonstrated that this proposed strategy achieves high prediction accuracy as well as to a significant degree high computational efficiency over different datasets. Keywords DNA sequence · Feature extraction · Xception · Pseudoinverse learning (PIL)

1 Introduction The act of recognizing diverse organisms, congregating them into denominations and finally designating them, is termed taxonomy. Both obsolete and living organisms are characterized into clear groups of different comparable organisms with specific scientific names (Padial et al. 2010). The classification of organisms has various denominations, which are hierarchical. On the other hand, due to the advancement of DNA sequencing innovation with the accessibility of

Communicated by V. Loia.

B

Mohammed A. B. Mahmoud [email protected] Ping Guo [email protected]

1

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

2

School of Systems Science, Beijing Normal University, Beijing, China

superior computing hardware (e.g. GPUs) required to handle DNA sequences and generate helpful data about them, DNA characterization became a commonly employed technique at a rapid rate (Hebert and Gregory 2005). Considering the move to the DNA sequence examination and classification, different genome investigation apparatuses were developed for DNA, RNA and protein sequence examination and handling, such as RepRNA (Liu et al. 2016), Pse-Analysis (Liu et al. 2017), and Pse-in-One (Liu et al. 2015). Liu et al. (2016) employed random forest classifier by duplicating it three t