Rank Aggregation for Candidate Gene Identification
Differences of molecular processes are reflected, among others, by differences in gene expression levels of the involved cells. High-throughput methods such as microarrays and deep sequencing approaches are increasingly used to obtain these expression pro
- PDF / 233,594 Bytes
- 9 Pages / 439.36 x 666.15 pts Page_size
- 79 Downloads / 227 Views
Abstract Differences of molecular processes are reflected, among others, by differences in gene expression levels of the involved cells. High-throughput methods such as microarrays and deep sequencing approaches are increasingly used to obtain these expression profiles. Often differences of gene expression across different conditions such as tumor vs inflammation are investigated. Top scoring differential genes are considered as candidates for further analysis. Measured differences may not be related to a biological process as they can also be caused by variation in measurement or by other sources of noise. A method for reducing the influence of noise is to combine the available samples. Here, we analyze different types of combination methods, early and late aggregation and compare these statistical and positional rank aggregation methods in a simulation study and by experiments on real microarray data.
1 Introduction Molecular high-throughput technologies generate large amounts of data which are usually noisy. Often measurements are taken under slightly different conditions and produce values that in extreme cases may be contradictory and contain outliers. A. Burkovski Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, Ulm University, 89069 Ulm, Germany International Graduate School in Molecular Medicine, Ulm University, Ulm, Germany e-mail: [email protected] L. Lausser J.M. Kraus H.A. Kestler () Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, Ulm University, 89069 Ulm, Germany e-mail: [email protected]; [email protected]; [email protected] M. Spiliopoulou et al. (eds.), Data Analysis, Machine Learning and Knowledge Discovery, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-319-01595-8__31, © Springer International Publishing Switzerland 2014
285
286
A. Burkovski et al.
One way of establishing more stable relationships between genes is by transforming the data into ordinal scale by ranking their expression values profile-wise. High expression levels are thereby sorted at the top of the ranking. Common patterns can be revealed by combining these rankings via aggregation methods. These methods construct consensus rankings for which all input rankings have least disagreements in some sense. Here, we study the difference between two general combination procedures, namely: (a) early and (b) late aggregation. In early aggregation, gene values are aggregated by methods like mean or median and are ranked based on the aggregated value. In contrast, late aggregation is the process of building a consensus ranking after the data was transformed into ordinal scale individually. To what extent early and late aggregation approaches differ was not reported so far. In this simulation study we observe, that the quality and the results depend strongly on the underlying noise model of the data. If we assume that each sample is affected by slightly different technica
Data Loading...