Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions
- PDF / 1,103,650 Bytes
- 9 Pages / 595.224 x 790.955 pts Page_size
- 22 Downloads / 196 Views
Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions Narges Manouchehri1
· Hieu Nguyen1 · Pantea Koochemeshkian2 · Nizar Bouguila1 · Wentao Fan3
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Data clustering as an unsupervised method has been one of the main attention-grabbing techniques and a large class of tasks can be formulated by this method. Mixture models as a branch of clustering methods have been used in various fields of research such as computer vision and pattern recognition. To apply these models, we need to address some problems such as finding a proper distribution that properly fits data, defining model complexity and estimating the model parameters. In this paper, we apply scaled Dirichlet distribution to tackle the first challenge and propose a novel online variational method to mitigate the other two issues simultaneously. The effectiveness of the proposed work is evaluated by four challenging real applications, namely, text and image spam categorization, diabetes and hepatitis detection. Keywords Infinite mixture models · Dirichlet process mixtures of scaled Dirichlet distributions · Online variational learning · Spam categorization · Diabetes · Hepatitis.
1 Introduction Considerable growth in technologies results in generating various types of digital data such as text, image and video which provides opportunities to extract valuable information and meaningful patterns. Thus, finding an efficient model to describe data has become an interesting
Narges Manouchehri
[email protected] Pantea Koochemeshkian p [email protected] Nizar Bouguila [email protected] Wentao Fan [email protected] 1
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada - H3G 1M8
2
Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada - H3G 1M8
3
Department of Computer Science and Technology, Huaqiao University, Xiamen, China
research topic (Kaufman and Rousseeuw 2009). Among all machine learning approaches (Bishop 2006), data clustering has received much attention (Jain et al. 1999). Finite mixture models as one of the widely used clustering methods have shown significant flexibility to describe the data in several domains and applications (McLachlan and Peel 2004). In applying these powerful tools, choosing the most appropriate distribution that best fit data is important. Gaussian mixture models (GMM) have been extensively adopted in various real-world applications. However, Gaussian assumption can not be taken for granted in general. In recent years, other distributions such as Dirichlet (Bouguila and Ziou 2004), generalized Dirichlet (Bouguila and Ziou 2007), inverted Dirichlet (Bdiri and Bouguila 2012) and Beta-Liouville (Bouguila 2012; Fan and Bouguila 2013) have been used as flexible alternatives. Another challenging issue when applying mixture models, is selecting mixture complexity. In other words, determinati
Data Loading...