Parallel Dictionary Compression Using Grid Technologies

This paper introduces a novel algorithm which approaches dictionary compression without the preliminary knowledge of the grammatical rules. Any type of languages except for incorporating ones can be processed by this solution in an effective way. The algo

PDF / 450,857 Bytes
8 Pages / 430 x 660 pts Page_size
32 Downloads / 165 Views

DOWNLOAD

REPORT

tract. This paper introduces a novel algorithm which approaches dictionary compression without the preliminary knowledge of the grammatical rules. Any type of languages except for incorporating ones can be processed by this solution in an eﬀective way. The algorithm cuts words derived from the same stem into base word, preﬁx and suﬃx groups from which a hierarchical dictionary is constructed allowing spell checking, possible stem determination, and eﬃcient distributed parallel pattern matching. By eliminating the severe redundancy in the word’s simple treerepresentation, the compression ratio can be signiﬁcantly better than by using conventional techniques.

1

Introduction

Nowadays, with the spread of diﬀerent embedded systems, the need of an eﬃcient, transparent dictionary compression is becoming more intense. The development of input methods is evolving from the unaccustomed formal commands to the natural human language. This is mainly caused by the fact that the amount of digitally exchanged information is accelerating in a tremendous rate. This information mainly consists of three parts: audio, video and text. In most cases the problems of audio and video compression have been extensively analyzed and partially solved by the industry due to the demanding public need. Since the demand for natural language support in electronic equipments is also increasing, it is indispensable to develop an eﬀective method to compress and store languages. Describing languages has several diﬃculties [2,3]. First, every language has its own specialities, which means that the structure of the languages is diverse, meanwhile the grammar varies a lot too. Second, the words derived from a stem can not be determined by the grammatical rules and the grammatical category of the stem only, the meaning has to be taken into account too [4]. This renders a grammatical rule based generative algorithm nearly useless. Third, the size of the uncompressed dictionary is extreme(5–40 GB), and it would be desirable to use the dictionary in an environment where the resources are limited. This means that the dictionary has to be compact enough to ﬁt into the device, and has to be accessible through a low-cost methods, since the available computation resources I. Lirkov, S. Margenov, and J. Wa´ sniewski (Eds.): LSSC 2007, LNCS 4818, pp. 492–499, 2008. c Springer-Verlag Berlin Heidelberg 2008

Parallel Dictionary Compression Using Grid Technologies

493

are limited too. These contradictionary requirements have to be simultaneously met in order to create a viable and widely usable system. The main aim of this paper is to solve the problems of currently used dictionary compression methods, and provide a distributed multi-language dictionary which in the ﬁrst step facilitates word level storage, but can be extended to support sentence level rules. The algorithm is able to extract the grammatical or meaning based rules from the input. This eliminate the dictionary’s direct dependency form the generative grammatical rules. These extracted rules does not

Data Loading...

Parallel Dictionary Compression Using Grid Technologies

Recommend Documents

A dictionary-based text compression technique using quaternary code

A Dictionary Learning-Based Off-Grid DOA Estimation Method Using Khatri-Rao Product

Parallel Computing Technologies 15th International Conference, PaCT

Advanced Parallel Processing Technologies 6th International Workshop

Advanced Parallel Processing Technologies 11th International Symposi

Parallel Computational Technologies 12th International Conference, P

Parallel and Distributed Computing, Applications and Technologies

Parallel Computing Technologies 10th International Conference, PaCT

Parallel Computing Technologies 14th International Conference, PaCT

Advances on P2P, Parallel, Grid, Cloud and Internet Computing Procee

Distributed and Parallel Systems In Focus: Desktop Grid Computing

Advances on P2P, Parallel, Grid, Cloud and Internet Computing Procee