A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Prote

Identifying a protein’s subcellular localization is meaningful to understand the function of the protein. While experimental method to identify the subcellular localization of proteins will cost a lot of time, it is necessary to utilize computational appr

PDF / 809,487 Bytes
14 Pages / 439.37 x 666.14 pts Page_size
71 Downloads / 298 Views

DOWNLOAD

REPORT

)

(

)

1 Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China {zhouhang2,hbshen}@sjtu.edu.cn Department of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China [email protected]

Abstract. Identifying a protein’s subcellular localization is meaningful to under‐ stand the function of the protein. While experimental method to identify the subcellular localization of proteins will cost a lot of time, it is necessary to utilize computational approaches for dealing with large scale proteins of unknown loca‐ tion. Current predictors mostly consider the annotation-based features but few of them take their correlation into account. Moreover, most of predictors can only deal with single-locational proteins, while a lot of proteins bear multi-locational characteristics, which play important roles in many biological processes. In this paper, we propose a novel prediction method, which extracts features from prior biological knowledge by considering the correlation between annotation terms. The new method can also deal with the multi-localization problem. We compared the performance of the proposed method with other predictors on four datasets. The result shows that our method is outperform than others. Keywords: Subcellular localization · Multi-label · Correlation · Gene Ontology

1

Introduction

The information of protein subcellular localization is crucial for understanding molec‐ ular function and related biological process of proteins. Since it is labor-intensive and time-consuming to identify a protein’s cellular compartment by biological experiments, in-silico tools for the prediction of locations are of great necessity in addressing large scale data sets of proteins with unknown locations. According to SWISS-PROT knowl‐ edgebase [1] released in January 2012, among the total of 534242 proteins, only 66203 proteins have deﬁned subcellular localization annotations while 247504 proteins have uncertain location annotations. Machine learning-based computational tools, which allow automatic prediction for the proteins with unknown locations by utilizing available subcellular location annotations, have been largely developed for the last decade. More‐ over, as protein sequences and various annotation data grow rapidly in public databases, © Springer Nature Singapore Pte Ltd. 2016 T. Tan et al. (Eds.): CCPR 2016, Part II, CCIS 663, pp. 499–512, 2016. DOI: 10.1007/978-981-10-3005-5_41

500

H. Zhou et al.

more available information could be used in computational tools to provide more precise predictions, especially for some diﬃcult issues, such as the locations with very few known examples, or the proteins with multiple locations. The computational prediction methods mainly consist of two types of features. One is annotation-based and the other is sequence-based. Sequence-based features include amino acid composition [2, 3], amino acid pair [4, 5], pseudo-amino

Data Loading...

A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Prote

Recommend Documents

Subcellular Localization

The Coronin Family of Proteins Subcellular Biochemistry

Cell Polarity and Subcellular RNA Localization

Predicting subcellular localization of multi-location proteins by improving support vector machines with an adaptive-dec

Subcellular localization of the porcine deltacoronavirus nucleocapsid protein

A methodology for image annotation of human actions in videos

A new tsunami runup predictor

Interval Optimization Considering the Correlation of Parameters

Mining Local Discourse Annotation for Features of Global Discourse Structure

A new interpretation of the Gini correlation

Creating New Medical Ontologies for Image Annotation A Case Study

Immunofluorescent Localization of Proteins in Schistosoma mansoni