Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions

PDF / 1,103,650 Bytes
9 Pages / 595.224 x 790.955 pts Page_size
22 Downloads / 213 Views

Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions Narges Manouchehri1

· Hieu Nguyen1 · Pantea Koochemeshkian2 · Nizar Bouguila1 · Wentao Fan3

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Data clustering as an unsupervised method has been one of the main attention-grabbing techniques and a large class of tasks can be formulated by this method. Mixture models as a branch of clustering methods have been used in various fields of research such as computer vision and pattern recognition. To apply these models, we need to address some problems such as finding a proper distribution that properly fits data, defining model complexity and estimating the model parameters. In this paper, we apply scaled Dirichlet distribution to tackle the first challenge and propose a novel online variational method to mitigate the other two issues simultaneously. The effectiveness of the proposed work is evaluated by four challenging real applications, namely, text and image spam categorization, diabetes and hepatitis detection. Keywords Infinite mixture models · Dirichlet process mixtures of scaled Dirichlet distributions · Online variational learning · Spam categorization · Diabetes · Hepatitis.

1 Introduction Considerable growth in technologies results in generating various types of digital data such as text, image and video which provides opportunities to extract valuable information and meaningful patterns. Thus, finding an efficient model to describe data has become an interesting

Narges Manouchehri

[email protected] Pantea Koochemeshkian p [email protected] Nizar Bouguila [email protected] Wentao Fan [email protected] 1

Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada - H3G 1M8

2

Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada - H3G 1M8

3

Department of Computer Science and Technology, Huaqiao University, Xiamen, China

research topic (Kaufman and Rousseeuw 2009). Among all machine learning approaches (Bishop 2006), data clustering has received much attention (Jain et al. 1999). Finite mixture models as one of the widely used clustering methods have shown significant flexibility to describe the data in several domains and applications (McLachlan and Peel 2004). In applying these powerful tools, choosing the most appropriate distribution that best fit data is important. Gaussian mixture models (GMM) have been extensively adopted in various real-world applications. However, Gaussian assumption can not be taken for granted in general. In recent years, other distributions such as Dirichlet (Bouguila and Ziou 2004), generalized Dirichlet (Bouguila and Ziou 2007), inverted Dirichlet (Bdiri and Bouguila 2012) and Beta-Liouville (Bouguila 2012; Fan and Bouguila 2013) have been used as flexible alternatives. Another challenging issue when applying mixture models, is selecting mixture complexity. In other words, determinati

Data Loading...

Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions

Recommend Documents

Dirichlet Tessellation

Dirichlet Tessellation

Image Categorization Using Agglomerative Clustering Based Smoothed Dirichlet Mixtures

Interior Dirichlet Problem

Dirichlet Graph Densifiers

Exterior Dirichlet Problem

Dirichlet A Mathematical Biography

Dirichlet Integrals on Harmonic Spaces

A Generalization of Abel and Dirichlet Criteria

An Infinite Mixture Model of Generalized Inverted Dirichlet Distributions for High-Dimensional Positive Data Modeling

Nondiagonal Mixture of Dirichlet Network Distributions for Analyzing a Stock Ownership Network

Dirichlet Groups and Lattice Reduction