MeDeCom: discovery and quantification of latent components of heterogeneous methylomes

  • PDF / 1,944,113 Bytes
  • 20 Pages / 595 x 791 pts Page_size
  • 25 Downloads / 143 Views

DOWNLOAD

REPORT


METHOD

Open Access

MeDeCom: discovery and quantification of latent components of heterogeneous methylomes Pavlo Lutsik1,4† , Martin Slawski2,3,5† , Gilles Gasparoni1 , Nikita Vedeneev2 , Matthias Hein2* and Jörn Walter1*

Abstract It is important for large-scale epigenomic studies to determine and explore the nature of hidden confounding variation, most importantly cell composition. We developed MeDeCom as a novel reference-free computational framework that allows the decomposition of complex DNA methylomes into latent methylation components and their proportions in each sample. MeDeCom is based on constrained non-negative matrix factorization with a new biologically motivated regularization function. It accurately recovers cell-type-specific latent methylation components and their proportions. MeDeCom is a new unsupervised tool for the exploratory study of the major sources of methylation variation, which should lead to a deeper understanding and better biological interpretation. Keywords: DNA methylation, DNA methylome, Cell heterogeneity, Deconvolution, Matrix factorization, Epigenetics

Background DNA methylation is one of the most extensively studied epigenetic marks in the human genome. Methods of detection and quantification are relatively robust and methylation data can be obtained at single-base resolution. DNA methylation closely mirrors the functional state of a cell [1]. Each human cell type has a characteristic methylation profile (methylome) covering its roughly 27 million CpG dinucleotides [2, 3]. DNA methylomes undergo significant global and lineage-related changes during development [4] and form cell-type-specific patterns upon differentiation [3, 5, 6]. They also reflect the individual (genetic) constitution [7], are influenced by gender, are subject to environmental influences [8, 9], and change with age [10]. In aging cells and in diseased cells, they accumulate errors over time and DNA replications [11, 12]. DNA methylation can, therefore, be used to infer the developmental origin, the cell-type specificity, and many other biological and sampling variables *Correspondence: [email protected]; [email protected] † Equal contributors 1 Department of EpiGenetics, Saarland University, Campus A2.4, Saarbrücken 66123, Germany 2 Machine Learning Group, Saarland University, Campus E1.1, Saarbrücken 66123, Germany Full list of author information is available at the end of the article

contributing to individual epigenetic profiles. A knowledge of these confounding effects and their consequences for methylome changes are of utmost importance for a biological interpretation of DNA-methylation changes in comparative studies. For practical reasons, comparative epigenomic studies often use tissue samples or cells extracted from body fluids (mostly blood) [3, 13]. All these sources are composed of several major and minor cell types with variable composition [14]. Blood, for example, includes up to ten major and many more minor cell types. Cell type-attributed heterogeneity was shown to be a ma