Interaction Between Data Parallel Compilation and Data Transfer and Storage Cost Minimization for Multimedia Application

Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded programmable parallel processor context. Our main contribution in this context is the proposal of a formalized DTSE (data transfer and

  • PDF / 139,162 Bytes
  • 9 Pages / 431 x 666 pts Page_size
  • 82 Downloads / 157 Views

DOWNLOAD

REPORT


IMEC, Kapeldreef 75, B-3001 Leuven, Belgium Professor at the Katholieke Universiteit Leuven IBM T.J. Watson Research Center, Yorktown Heights, NY 2

3

Abstract. Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded programmable parallel processor context. Our main contribution in this context is the proposal of a formalized DTSE (data transfer and storage exploration) methodology, which allows to significantly reduce system bus load and hence overall system performance and also power consumption. We demonstrate the complementarity of this methodology by coupling the DTSE with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life video and image processing applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power and also significantly reduces the total execution time. Decomposing the detailed parallelization and DTSE issues into two different stages is important to obtain the benefits of both the stages without exploding the complexity of solving all the issues simultaneously.

1

Introduction and Related Work

Parallel machines were mainly, if not exclusively, being used in scientific communities until recently. Lately, the rapid growth of real-time multi-media applications have brought new challenges in terms of the required processing (computing) power and power requirements. For this type of applications, especially video, graphics and image processing, the processing power of traditional uniprocessors is no longer sufficient. This has lead to the introduction of small- and medium-scale parallelism in this field too, but then mostly oriented towards single chip systems for cost reasons. Today, many weakly parallel video and multi-media processors are emerging (see [14] and its references), increasing the importance of parallelization techniques. Applications on these processors are parallelized manually even now, which can be tedious and error-prone. This paper presents evidence that parallelizing compilers can be used effectively to deal with this problem, if they are combined with other techniques. Indeed, the cost functions to be used in these new emerging application fields are no longer purely performance based. Power is also a crucial factor, and has to P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 668–676, 1999. c Springer-Verlag Berlin Heidelberg 1999

Interaction Between Data Parallel Compilation and Data Transfer

669

be optimized for a given throughput. Real time multi-media processing (RMP) applications are usually memory intensive and a significant part of the power consumption is due to the data transfers i.e. in the memory hierarchy [16]. In a parallel processor context most of the research effort in the community so far addresses the problem of parallelization and processor partitioning [2]. Existing approaches do not sufficiently take into account the background storage and transfer related cost. A first approach for