Localization of Data Transfer in Processor Arrays

In this paper we present an approach to localize the data transfer in processor arrays. Our aim is to select channels between processors of the processor array performing the data transfers. Channels can be varying with respect to the bandwidth and to the

  • PDF / 257,516 Bytes
  • 8 Pages / 431 x 666 pts Page_size
  • 1 Downloads / 221 Views

DOWNLOAD

REPORT


Abstract. In this paper we present an approach to localize the data transfer in processor arrays. Our aim is to select channels between processors of the processor array performing the data transfers. Channels can be varying with respect to the bandwidth and to the communication delay and can be bidirectional. Our objective is to minimize the implementation cost of the channels while satisfying the data dependencies. The presented approach also applies to the problem of localizing data dependencies for a given interconnection topology. The formulation of our method as an integer linear program allows its use for automatic parallelization.

1

Introduction

Processor arrays (PA) are well suited to implement time-consuming algorithms of signal processing with real-time requirements. Technological progress allows the implementation of even complex processor arrays in silicon as well as in FPGAs. To explore the degrees of freedom in the design of processor arrays automatic tools are required. Processor arrays are characterized by a significant number of processors which communicate via interconnections in a small neighborhood. Data transfer caused by the original algorithm has to be organized using local interconnections between processors. This paper covers the design of a cost-minimal interconnection network and the organization of the data transfers using this interconnections. A solution of the problem of organizing the data transfers for a given interconnection network is also presented. The design of processor arrays is well studied (e.g. [2, 7, 8, 10, 13]) and became more realistic by inclusion of resource constraints [3, 5, 12]. But up to now, only some work has been done in the organization of data transfer. Fortes and Moldovan [6] as well as Lee and Kedem [9] discuss the need of a decomposition of global interconnections into a set of local interconnections without consideration of access conflicts to channels. Chou and Kung [1] present an approach to organize the communications in a partitioned processor array, but do not give a solution for the decomposition problem. In this paper, we present an approach to localize the data transfer in processor arrays. Channels with different bandwidth and latency can be selected to implement the interconnections between processors. The data transfers which are P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 401–408, 1999. c Springer-Verlag Berlin Heidelberg 1999

402

Dirk Fimmel and Renate Merker

given as displacement vectors are decomposed into a set of channels. The decomposition of displacement vectors as well as the order of using the channels is determined by an optimization problem. The objective of the optimization problem is to minimize the cost associated with an implementation of the channels in silicon. The paper is organized as follows. Basics of the design of processor arrays are given in section 2. In section 3 a model of channels between processors is introduced. The communication problem which describes the organization of the data transfers is