Bayesian Matrix Factorization for Outlier Detection: An Application in Population Genetics

We present a new Bayesian hierarchical model based on matrix factorization for detecting outliers in high-dimensional data. Outliers are explicitly modeled using both a shift-in-mean and variance inflation approach. The Bayesian framework provides intrins

PDF / 193,139 Bytes
5 Pages / 439.36 x 666.15 pts Page_size
68 Downloads / 190 Views

DOWNLOAD

REPORT

Bayesian Matrix Factorization for Outlier Detection: An Application in Population Genetics Nicolas Duforet-Frebourg and Michael G.B. Blum

Abstract We present a new Bayesian hierarchical model based on matrix factorization for detecting outliers in high-dimensional data. Outliers are explicitly modeled using both a shift-in-mean and variance inflation approach. The Bayesian framework provides intrinsic probabilities of being an outlier for each element in the sample. Posterior replicates of the parameters are simulated using a MCMC algorithm. In population genetics where many genetic markers are typed in different populations, we show that this model can be used to detect genes targeted by Darwinian selection.

28.1 Introduction Matrix factorization aims at decomposing a high-dimensional n × p data matrix into a product of two lower rank K matrices called the factor and loading matrices [4]. Matrix factorization provides a useful framework to model outliers in the lowerdimensional space generated by the low-rank approximation [3]. Detecting outliers in high-dimensional data sets is of interest in population genetics in order to detect genes under selective pressures [1]. The proposed approach provides an intrinsic probability of being an outlier so that we can estimate false discovery rate (FDR) and q-values, which are two important quantities in whole-genome scans [6]. We provide a MCMC algorithm to sample replicates from the posterior distribution and we show how the method can detect genes under selection in population genetics data.

N. Duforet-Frebourg () • M.G.B. Blum Laboratoire TIMC-IMAG UMR 5525, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France e-mail: [email protected]; [email protected] 143 E. Lanzarone and F. Ieva (eds.), The Contribution of Young Researchers to Bayesian Statistics, Springer Proceedings in Mathematics & Statistics 63, DOI 10.1007/978-3-319-02084-6__28, © Springer International Publishing Switzerland 2014

144

N. Duforet-Frebourg and M.G.B. Blum

28.2 Bayesian Matrix Factorization for Outlier Detection 28.2.1 Model The probabilistic model of matrix factorization—also known as factor or probabilistic PCA model—for a design n × p matrix Y relies on a product between a factor matrix F and a loading matrix Λ: Y = F Λ + ,

(1)

where F is an n × K matrix, Λ is a K × p matrix, and is an n × p residual matrix where each row i ∼ N (0p , σ 2 Ip ). Here, we choose a Gaussian prior for Λ p 2 N (Λj ; 0K , σΛ IK ). p(Λ|σΛ ) = Πj=1

(2)

To specify the prior of F , we explicitly model outliers using the shift-in-mean approach [5] for one of the K factors of the low-rank approximation (Zi )

n N (Fi ; 0K + Ai p(F |A, Z, ΣF ) = Πi=1

, ΣF ),

(3)

where ΣF is a diagonal matrix with values σF2 k . We specify improper priors for 2 ) ∝ σ12 and p(σF2 k ) ∝ σ21 . Shift vector Ai s are zero-valued vectors variances p(σΛ Λ

Fk

with nonzero component at index Zi . For i = 1, . . . , n, Zi is an integer between 0 and K, indicating that the ith line is eit

Data Loading...

Bayesian Matrix Factorization for Outlier Detection: An Application in Population Genetics

Recommend Documents

Nonparametric Bayesian Nonnegative Matrix Factorization

Bayesian mean-parameterized nonnegative binary matrix factorization

Nonnegative Residual Matrix Factorization for Community Detection

Outlier Detection

Population Genetics

Population Genetics

Matrix factorization of large scale data using multistage matrix factorization

Deep Matrix Factorization on Graphs: Application to Collaborative Filtering

Handcrafted Outlier Detection Revisited

An Optimized Approach of Outlier Detection Algorithm for Outlier Attributes on Data Streams

New Developments in Unsupervised Outlier Detection Algorithms an

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams