Simulation and Visualization on the Grid Parallelldatorcentrum Kungl
It is now 30 years since the network for digital communication, the ARPA-net, first came into operation. Since the first experiments with sending electronic mail and performing file transfers, the development of networks has been truly remarkable. Today's
- PDF / 32,823,954 Bytes
- 317 Pages / 439.37 x 666.142 pts Page_size
- 39 Downloads / 181 Views
PARALLELLDATORCENTRUM SEVENTIl A NNUAL CONFERENCE
13
Springer-Verlag Berlin Heidelberg GmbH
Bjorn Engquist Lennart Iohnsson Michael Hammill Faith Short (Eds.)
Simulation and Visualization on the Grid Parallelldatorcentrum Kungl Tekniska Hogskolan Seventh Annual Conference Stockholm, Sweden December 1999 Proceedings
With 118 Pigures, 25 in Colour
Springer
Editors Bjor n Engqu ist
Michael Hammill Faith Short Parallelldatorceutr um Kungl Tekniska H ăgskolan tOO 44 Stockholm, Sweden e-mail: engqulstâ' pdckth.se mlkeâ'pdc.kth.se [email protected]
Lennart Johnsson Department of Computer Science University of Houston 4800Ca1houn Road Houston, TX77204-3475, USA e-mail: [email protected]
Cataloging· jn·PubliCl lion Dala l pplied for Oi. CeulllClIeS'b liolhek . CIP·Einn' ,IUllfna l"n. Sirnulaliotl I nd viaua lizlbOn 011 1'" gnd : Pa rallt lldalOreantrum. """'In an""aI COIII'flnet. Stoel A(O,k) call dlasYpb(ks,a(O,O,bk*ml),nb,j,j+ns-l,ipiv,l) solve L*X = A(j:j+nb-l,k:k+ks-l) call dllnu4(nb,ks,a(O,O,iti),nb,a(O,O,ibi),nb) bi=bj do i=j+nb,m-l,nb bi=bi+l ms=nb if(bi.eq.ml-1)ms=m2 iai=bi+ml*bj (O,O)-th element of block iai is A(i,j) ici=bi+ml*bk ! (O,O)-th element of block ici is A(i,k) update A(i:i+ms-l,k:k+ks-l) = A(i:i+ms-l,k:k+ks-l) - A(i:i+ms-l,j:j+jb-l)*A(j:j+nb-l,k:k+ks-l) call dab4(ms,ks,nb,aCO,O,iai),nb,a(O,O,ibi),nb, a(O,O,ici),nb) enddo enddo bi=O do i=O,bj-l back pivot A(j:j+m-l,i:i+nb-l) ! a(O,O,bi) -> A(O,i*nb) call dlasYpbCnb,a(O,O,bi),nb,j,j+ns-l,ipiv,l) bi=bi+ml enddo next block j col bj=bj+l enddo
&
51 54 55
nl=Cn+nb-t)/nb m2=m+nb-m1*nb n2=n+nb-nl*nb bj=O do j=O,n-l,nb
New Data Structures for Matrices Lead to a Variety of Algorithms 62 63
51
return end
In the blocked version, the outer loop is on bj = 0, n1-1 and for each bj, one factors a block column L *U = P * A(j : m -1, j : n - 1) by calling kernel routine RGETF3. Then columns j + nb to n - 1 are processed in three steps. Let k : k + ks - 1 be the generic block column. First, there is a forward pivot step. Next, there is a DTRSM computation done by kernel routine DLLNU4. Finally, there is a DGEMM update, which is done by a series of calls to kernel routine DAB4. After these three steps, there is a back pivot step. There are three kernel routines in the block equivalent (see Subroutine 1). They are a factor a panel of size M xN, where N :::; N B kernel called RGETF3, a DTRSM kernel called DLLNU4, and a DGEMM kernel called DAB4. We mention that factor part has the same function as LAPACK routine DGETF2. However, it is a level-3 routine, done recursively, as the prefix R and suffix 3 indicates. Note that the vanilla routine, actually the LINPACK routine DGEFA, does not have a back pivot step. So the blocked version does extra work that could be avoided. DGEFB might be packaged as a subroutine in standard Fortran. Following LAPACK, we suggest using the input format ofDGETRF(M, N, A, LOA, IPIV, INFO). The new routine would have a nearly identical calling sequence: DBGTRF (M, N, NS, A, LDA, IPIV, INFO). The new parameter is NS ~ n * LDA. If N