Efficient Load-Balancing and Communication Overlap in Parallel Shear-Warp Algorithm on a Cluster of PCs

In the medical field, volume rendering provides good quality 3D visualizations but is still not interactive enough for a day-to-day practice. The most efficient sequential algorithm is the Shear-Warp algorithm. It renders up to 10 images per second for a

  • PDF / 308,025 Bytes
  • 8 Pages / 431 x 666 pts Page_size
  • 7 Downloads / 186 Views

DOWNLOAD

REPORT


PC, ENS-Lyon, F-69364 Lyon cedex 07, France LIP, ENS-Lyon, F-69364 Lyon cedex 07, France [fchaussu,desprez,mloi]@ens-lyon.fr

2

Abstract. In the medical field, volume rendering provides good quality 3D visualizations but is still not interactive enough for a day-to-day practice. The most efficient sequential algorithm is the Shear-Warp algorithm. It renders up to 10 images per second for a small dataset. The goal of this paper is to present an efficient parallel implementation of the Shear-Warp algorithm for a distributed memory architecture, a cluster of PCs connected with a high speed network.

1

Introduction and Motivations

Real-time rendering is an important goal in visualization applications. Most of these applications require the generation of a sequence of images for different orientations of the volume. Consequently, real-time rendering should enable a continuous visualization of the volume as its orientation changes. Moreover, higher and higher resolution datasets combined with the high computational cost of direct volume rendering makes it difficult, if not impossible, for sequential implementations to deliver the required level of performance. Therefore, such applications have been parallelized not to trade off image quality for speed. Lacroute and Levoy [LL94] developed the Shear-Warp algorithm that exploits coherence in the volume and image space. This algorithm is currently acknowledged to be the fastest sequential volume rendering algorithm. The goal of this paper is to present an efficient parallel implementation of the Shear-Warp algorithm for a distributed memory architecture, a cluster of PCs connected with a high-speed network and using a light weight and fast communication layer. This new parallel implementation is load balanced and overlaps communication with computation using MPI asynchronous communications. This paper is organized as follows: in the next section, we describe and analyze the Shear-Warp algorithm. The third section exhibits the main problems associated with the parallel formulation. It focuses on architecture, task partition and communications patterns. In the fourth section, we propose a new dynamic load-balancing algorithm for the Shear-Warp algorithm tuned to interactivity. In order to improve the scalability of the algorithm, we discuss, in the fifth section, the possibility of implementing communication overlap in this algorithm. P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 570–577, 1999. c Springer-Verlag Berlin Heidelberg 1999

Efficient Load-Balancing and Communication Overlap

571

The last section, gives our experimental results in terms of load-balancing and scalability.

2

The Shear-Warp Algorithm

Volume rendering [Kau96] is the process of creating a 2D image directly from 3D volumetric data so that no information contained within the data is lost during the rendering process. For example, in computed tomography scanned data, useful informations are not only contained on the surfaces but also within the data. Therefore, it must have a volumetric repre