A Highly Efficient XML Compression Scheme for the Web
Contemporary XML documents can be tens of megabytes long, and reducing their size, thus allowing to transfer them faster, poses a significant advantage for their users. In this paper, we describe a new XML compression scheme which outperforms the previous
- PDF / 481,869 Bytes
- 12 Pages / 430 x 660 pts Page_size
- 114 Downloads / 228 Views
University of Wroclaw, Institute of Computer Science Joliot-Curie 15, 50–383 Wroclaw, Poland [email protected] Institute of Information Technology in Management, Szczecin University Mickiewicza 64, 71–101 Szczecin, Poland 3 Technical University of L ´ od´z, Computer Engineering Department Politechniki 11, 90–924 L ´ od´z, Poland
Abstract. Contemporary XML documents can be tens of megabytes long, and reducing their size, thus allowing to transfer them faster, poses a significant advantage for their users. In this paper, we describe a new XML compression scheme which outperforms the previous state-of-theart algorithm, SCMPPM, by over 9% on average in compression ratio, having the practical feature of streamlined decompression and being almost twice faster in the decompression. Applying the scheme can significantly reduce transmission time/bandwidth usage for XML documents published on the Web. The proposed scheme is based on a semi-dynamic dictionary of the most frequent words in the document (both in the annotation and contents), automatic detection and compact encoding of numbers and specific patterns (like dates or IP addresses), and a backend PPM coding variant tailored to efficiently handle long matching sequences. Moreover, we show that the compression ratio can be improved by additional 9% for the price of a significant slow-down. Keywords: XML compression, semi-structural data compression, text transform, prediction by partial matching.
1
Introduction
Year after year, XML consolidates its hold as a standard for interchange of structured information. With the recent international standardization of the OpenDocument format and the introduction of the Open XML format in Microsoft Office 2007, this trend can only be accelerated. As the information technology usage becomes more and more Web-centric, there is a considerable amount of XML documents exchanged through the Web every day. Contemporary XML documents can be tens of megabytes long, and although, from the standpoint of Internet providers, they represent only a small share of traffic dominated by video sharing websites and peer-to-peer file exchange networks, reducing the size of XML documents, thus allowing to transfer them faster, poses a significant advantage for their users. V. Geffert et al. (Eds.): SOFSEM 2008, LNCS 4910, pp. 766–777, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Highly Efficient XML Compression Scheme for the Web
767
There are several XML compression algorithms available, renowned for their high compression ratios. The ratios are high in absolute terms, still, XML data can be squeezed further with more advanced techniques. In this paper, we describe a new XML compression scheme which sets the new state-of-the-art in XML compression. The scheme attains impressive compression ratios with reasonable compression/decompression times and supports streamed decompression, which makes it especially suitable for web applications. The outline of this paper is as follows. Section 2 presents earlier achievements in XML compression. Section
Data Loading...