Analysis and Comparison of Genomes of HIV-1 and HIV-2 Using Apriori Algorithm, Decision Tree, and Support Vector Machine
AIDS is caused by HIV, which can be divided into two strains: HIV-1 and HIV-2. Whereas HIV-1 is distributed around the world and is the major cause of global infections, HIV-2 is less infectious and transmissible and is therefore generally confined to Wes
- PDF / 874,510 Bytes
- 7 Pages / 439.37 x 666.14 pts Page_size
- 97 Downloads / 194 Views
)
1
2
Hankuk Academy of Foreign Studies, Yongin, Republic of Korea {yhr1501,tigerpaul09,mlee981001}@hafs.kr Department of Science, Hankuk Academy of Foreign Studies, Yongin, Republic of Korea {heresphill,tsyoon}@hafs.hs.kr
Abstract. AIDS is caused by HIV, which can be divided into two strains: HIV-1 and HIV-2. Whereas HIV-1 is distributed around the world and is the major cause of global infections, HIV-2 is less infectious and transmissible and is therefore generally confined to West Africa. Thus this research aims to account for their difference by analyzing genome sequences of HIV-1 and HIV-2 using some methods: Apriori algorithm, Decision tree, and Support Vector Machine. Apriori demonstrates that HIV-1 has lysine, arginine, and serine as its typical amino acids, while HIV-2 has glycine, lysine, leucine, and arginine. Decision tree determines the significant positions of amino acids that can distinguish the two viruses: pos5 in 9 window, pos13 in 13 window, and pos16 in 19 window. SVM indicates that two viruses are seemingly similar but indeed different. The collective results provide a biologically verifiable background for making effective vaccines for HIV, especially for HIV-2. Keywords: HIV-1 · HIV-2 · Amino acids · Bioinformatics · Data mining · Apriori algorithm · Decision tree · Support vector machine (SVM)
1
Introduction
Immunodeficiency is a state in which an immune system’s capability of protecting oneself from infectious diseases is weakened or absent [1]. AIDS is acquired by an exposure to human immunodeficiency virus (HIV) [2]. This research aims to draw a comparison between the properties of HIV-1 and HIV-2 in Africa, in order to provide a biologically verifiable account for the difference in distribution of the two types [3, 4]. The research thus analyzes the genomic DNA sequences of two virus through Apriori, Decision Tree, and Support Vector Machine (SVM). The objects of analysis are chosen by few criteria: first, two strains from same area will demonstrate the different properties that have actually caused current difference in distribution, since both began their propagation in Africa; second, two strains from the same area will reduce the unnecessary variability that arises from regional difference.
© Springer International Publishing Switzerland 2016 D.-S. Huang et al. (Eds.): ICIC 2016, Part I, LNCS 9771, pp. 392–398, 2016. DOI: 10.1007/978-3-319-42291-6_39
Analysis and Comparison of Genomes of HIV-1 and HIV-2
2
393
Materials and Methods
2.1 HIV HIV-1 is the most common strain of HIV. It accounts for the 95 % of the global HIV infections. It originated from Common Chimpanzee [5]. HIV-1 is more virulent and infectious due to its short incubation period. HIV-2 is the other strain of HIV that is concentrated mostly in West African countries such as Senegal and Nigeria. It originated from Sooty Mangabey [4]. It is less pathogenic because the incubation period is longer than that of HIV-1. This characteristic accounts for the lower transmissibility and slower progression to AIDS [6, 7]. 2.2 Windo
Data Loading...