A dictionary-based text compression technique using quaternary code

  • PDF / 429,337 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 88 Downloads / 210 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

A dictionary-based text compression technique using quaternary code Ahsan Habib1

· M. Jahirul Islam1 · Mohammad Shahidur Rahman1

Received: 24 March 2019 / Accepted: 28 August 2019 © Springer Nature Switzerland AG 2019

Abstract Improving encoding and decoding time in compression technique is a great demand to modern users. In bit level compression technique, it requires more time to encode or decode every single bit when a binary code is used. In this research, we develop a dictionary-based compression technique where we use a quaternary tree instead of a binary tree for construction of Huffman codes. Firstly, we explore the properties of quaternary tree structure mathematically for construction of Huffman codes. We study the terminology of new tree structure thoroughly and prove the results. Secondly, after a statistical analysis of English language, we design a variable length dictionary based on quaternary codes. Thirdly, we develop the encoding and decoding algorithms for the proposed technique. We compare the performance of the proposed technique with the existing popular techniques. The proposed technique performs better than the existing techniques with respect to decompression speed while the space requirement increases insignificantly. Keywords Algorithm analysis and design · Data compression · Graph theory · Huffman coding · Source coding · Tree data structures

1 Introduction Data compression is an important research area not only for saving space but also for reducing query time. Data compression is popular for accumulating data and reducing download, upload and transfer time [1]. Compression and decompression speeds are a very important parameter in data compression techniques. It is monotonous when it requires more time to compress or decompress a file. It has been revealed in the contemporary research that some algorithms achieved more compression ratio by sacrificing the processing speed, whereas some others achieved more speed by sacrificing space. In many cases, authors mainly consider decompression speed, whereas compression speed is also an important performance measuring parameter. Huffman coding is the most popular and widely used coding system, where compression is done by assigning a shorter

B

Ahsan Habib [email protected] M. Jahirul Islam [email protected] Mohammad Shahidur Rahman [email protected]

1

Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh

code to a symbol with higher frequencies. This technique assigns a unique codeword for each symbol [2]. Huffmanbased technique is used in many compression algorithms like Zip, PKZip, BZip2, and PNG. Multimedia codec such as JPEG and MP3 has a front-end model and quantization followed by Huffman coding. Almost all communications with and from the internet are at some points Huffman encoded. Almost all Huffman -based algorithms attempt to improve decoding speed; in most of the cases, they did not achieve very good compression speed. The existing Huffman-ba