Topics Classification of Arabic Text in Quran by Using Matlab
The book of God (Quran) referenced by more than 1.6 billion of Muslims around the world. Extracting information from the Quran is of high benefit for both specialized as well as non-specialized people in religion. The Quran language is Arabic. Since the b
- PDF / 3,271,501 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 69 Downloads / 206 Views
stract. The book of God (Quran) referenced by more than 1.6 billion of Muslims around the world. Extracting information from the Quran is of high benefit for both specialized as well as non-specialized people in religion. The Quran language is Arabic. Since the best software of text mining like Matlab and R doesn’t sport Arabic language. However, this paper proposes a technical method for using Matlab text analytic toolbox for Arabic text. The aim of this paper is to find the approaches for analysing Arabic text of Quran and then providing statistical information which might be helpful for the people in this research area, then different text mining operations are applied like wordcloud, word embedding, clustering, topic and classification. Also in this paper the classification of verses is given by topics using LDA, SVM and neural network. Keywords: Arabic natural language processing Mathematical modelling Quran & text mining
Matlab
1 Introduction The book of God (Allah) Quran with Arabic text referenced by more than 1.6 billion of Muslims around the world. The Arabic language is a unique language, and has many special and unique features which make it suitable for it to convey; many meaning in few words, subtleties, emphasis and powerful imagery through speech alone. If Allah was to convey a message to mankind, it would be through a language which is easy to learn, and has the highest form of expressiveness. Arabic is a language based on a system of ‘roots’. In English, we often refer to the ‘root’ of a word to mean its origin. The Arabic root, or masdar ()ﻣﺼﺪﺭ, refers to the core meaning of a word. This core can usually be identified by root consonants [1]. Using derivation system of roots and patterns, nouns, and verbs are derived in an almost mathematical way, leaving little room for confusion as to the desired meaning of the word. Of course the ideal model of this derivation is the Quran, and as you look through the Quran you will see these in play. Few research studies have considered the Arabic text of Quran, mathematicallybased studies [2], linear regression models [3] and text mining using R [4]. To the best of our knowledge, there is no research study that analyzed the Arabic text of the Quran using Matlab text analytic toolbox (wordcloud, word embedding, topic and © Springer Nature Switzerland AG 2019 Y. Farhaoui and L. Moussaid (Eds.): ICBDSDE 2018, SBD 53, pp. 333–341, 2019. https://doi.org/10.1007/978-3-030-12048-1_34
334
A. El Mouatasim and J. Oudaani
classification..) the way it is done in this paper. Also in this paper we use a topics finned by LDA method for classification the verses of Quran.
2 Quranic Arabic Text Mining 2.1
Preparing Quran for Analysis
The Quran has 78246 words. These words are grouped into 6236 verse ()ﺍﻳﺔ. A set of verses are grouped into 114 chapter ()ﺳﻮﺭﺓ. The text of the Quran has been downloaded from Tanzil project website [5], which represents an authentic verified source of the Quran text. The downloaded file includes the whole text of the Quran without diacrit
Data Loading...