A dictionary-based text compression technique using quaternary code

PDF / 429,337 Bytes
10 Pages / 595.276 x 790.866 pts Page_size
88 Downloads / 242 Views

ORIGINAL ARTICLE

A dictionary-based text compression technique using quaternary code Ahsan Habib1

· M. Jahirul Islam1 · Mohammad Shahidur Rahman1

Received: 24 March 2019 / Accepted: 28 August 2019 © Springer Nature Switzerland AG 2019

Abstract Improving encoding and decoding time in compression technique is a great demand to modern users. In bit level compression technique, it requires more time to encode or decode every single bit when a binary code is used. In this research, we develop a dictionary-based compression technique where we use a quaternary tree instead of a binary tree for construction of Huffman codes. Firstly, we explore the properties of quaternary tree structure mathematically for construction of Huffman codes. We study the terminology of new tree structure thoroughly and prove the results. Secondly, after a statistical analysis of English language, we design a variable length dictionary based on quaternary codes. Thirdly, we develop the encoding and decoding algorithms for the proposed technique. We compare the performance of the proposed technique with the existing popular techniques. The proposed technique performs better than the existing techniques with respect to decompression speed while the space requirement increases insignificantly. Keywords Algorithm analysis and design · Data compression · Graph theory · Huffman coding · Source coding · Tree data structures

1 Introduction Data compression is an important research area not only for saving space but also for reducing query time. Data compression is popular for accumulating data and reducing download, upload and transfer time [1]. Compression and decompression speeds are a very important parameter in data compression techniques. It is monotonous when it requires more time to compress or decompress a file. It has been revealed in the contemporary research that some algorithms achieved more compression ratio by sacrificing the processing speed, whereas some others achieved more speed by sacrificing space. In many cases, authors mainly consider decompression speed, whereas compression speed is also an important performance measuring parameter. Huffman coding is the most popular and widely used coding system, where compression is done by assigning a shorter

B

Ahsan Habib [email protected] M. Jahirul Islam [email protected] Mohammad Shahidur Rahman [email protected]

1

Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh

code to a symbol with higher frequencies. This technique assigns a unique codeword for each symbol [2]. Huffmanbased technique is used in many compression algorithms like Zip, PKZip, BZip2, and PNG. Multimedia codec such as JPEG and MP3 has a front-end model and quantization followed by Huffman coding. Almost all communications with and from the internet are at some points Huffman encoded. Almost all Huffman -based algorithms attempt to improve decoding speed; in most of the cases, they did not achieve very good compression speed. The existing Huffman-ba

Data Loading...

A dictionary-based text compression technique using quaternary code

Recommend Documents

Text Compression

Text Index Compression

Comparison of Document Generation Algorithms Using the Docs-as-Code Approach and Using a Text Editor

Techniques of Czech Language Lossless Text Compression

TADOC: Text analytics directly on compression

A forward email based high capacity text steganography technique using a randomized and indexed word dictionary

A superlative image encryption technique based on bit plane using key-based electronic code book

Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

A Technique for Conflict Detection in Collaborative Learning Environment by Using Text Sentiment

Text Similarity Between Concepts Extracted from Source Code and Documentation

Crystal Growth of Ternary and Quaternary Alkali Metal Bismuth Chalcogenides Using Bridgman Technique

Quad Code Sequence Generation Using Cyclic Redundancy Technique to Enhance Target Detection in Doppler Rader