GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU
With the development of the general computing ability of GPU, more and more algorithms are being run on GPU, to enjoy much higher speed. In this paper, we propose an approach that uniformly accelerate Gibbs Sampling for LDA (Latent Dirichlet Allocation) a
- PDF / 345,019 Bytes
- 11 Pages / 439.37 x 666.142 pts Page_size
- 107 Downloads / 226 Views
College of Computer and Control Engineering, Nankai University, Tianjin 300071, China [email protected] 2 State Key Lab. of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 3 Laboratory of Parallel Software and Computational Science, State Key Laboratory of Computing Science, Institute of Software, Chinese Academy of Sciences, Beijing, China
Abstract. With the development of the general computing ability of GPU, more and more algorithms are being run on GPU, to enjoy much higher speed. In this paper, we propose an approach that uniformly accelerate Gibbs Sampling for LDA (Latent Dirichlet Allocation) algorithm on GPU, which makes the data load to the cores of GPU evenly to avoid the idle waiting for GPU, and improves the utilization of GPU. We use three text mining datasets to test the algorithm. Experiments show that our parallel methods can achieve about 30x speedup over sequential training methods with similar prediction precision. Furthermore, the idea that uniformly partitioning the data bases on GPU can also be applied to other machine learning algorithms.
Keywords: CUDA Machine learning
1
·
Parallel LDA
·
Topic model
·
Data partition
·
Introduction
With the development of social networks, huge amount of text messages are produced every day. Text mining algorithms can extract and analyze useful information from a large collection of texts. Among them, LDA (Latent Dirichlet Allocation) algorithm based on Gibbs sampling [3] is a mature topic clustering algorithm. Gibbs sampling is a Markov-chain Monte Carlo method to perform inference. We simply call LDA algorithm based on Gibbs sampling as LDA algorithm in this paper. LDA algorithm can accurately extract the text theme and latent semantic [12,18], and it has been widely used in the field of micro blog recommendation, news search, semantic analysis, etc. However, due to the increasing amount of data on the Internet, running it on a CPU is usually timeconsuming. Thus how to accelerate the LDA algorithm efficiently has become a hot topic. c Springer Science+Business Media Singapore 2016 J. Wu and L. Li (Eds.): ACA 2016, CCIS 626, pp. 97–107, 2016. DOI: 10.1007/978-981-10-2209-8 9
98
P. Xue et al.
LDA does not take the order of words and documents into account, so it can be parallelized on multiple platforms. There are two common ways to do it: parallelize LDA algorithm based on distributed platform or based on shared memory multi-core platform. In the first scenario, with the increase of node number, the communication cost is also increased [22], which has a bad effect on performance. Also, the amount of work on a single node in a distributed cluster is still large. The problems of node communication and computation of the loose coupled line are solved by tightly coupled shared memory platform. However, the traditional data partition according to documents leads to severe load imbalance on different cores. Thus, all the cores have to carry out data synchronization after each iteration (all have to
Data Loading...