[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | Mathematics at DEC |
|
Moderator: | RUSURE::EDP |
|
Created: | Mon Feb 03 1986 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2083 |
Total number of notes: | 14613 |
306.0. "SIAM Trip Report" by TOOLS::STAN () Thu Jun 13 1985 22:17
From: GAUSS::ANDERSON 13-JUN-1985 14:57
To: HARE::RABINOWITZ
Subj: SIAM Conference on Applied Linear Algebra - trip report
+---+---+---+---+---+---+---+
| | | | | | | |
| d | i | g | i | t | a | l | I N T E R O F F I C E M E M O
| | | | | | | |
+---+---+---+---+---+---+---+
From: Ned Anderson
To: @SCI Date: May 29, 1985
Dept: Eastern Research Lab
DTN: 225-5969
Loc: HLO2-3/N04
Enet: GAUSS:ANDERSON
Subject: Report on the Second SIAM Conference on Applied
Linear Algebra, April 29 - May 2 1985, Raleigh NC.
This was a major conference, with 8 invited presentations,
several minisymposia, and close to 100 shorter (20 min.)
presentations in parallel session format. I attended the first
three days, which included several interesting invited (60 min.)
presentations. The shorter talks were in the areas of applied
linear algebra, numerical linear algebra, and parallel algorithms.
There were several sessions, formal and informal, devoted to
discussion of extending the BLAS (basic linear algebra
subroutines) of LINPACK. A number of purveyors of various
mathematical software packages for "the PC" were also on hand.
Invited presentations attended:
* Matrix Polynomials, by Peter Lancaster discussed spectral
theory and various canonical forms for matrix polynomials.
* Matrix Methods in the Numerical solution of Partial
Differential Equations, by Seymour Parter, discussed the
classical notions of stability and convergence, and the
importance of some eigenvalue estimation techniques for
studying these properties. He focused on some problems of
current interest in difference and multigrid methods for
partial differential equations.
* Numerical and Computational Methods in Linear Algebra, by
James Wilkinson, discussed ill-conditioning, defective
matrices and eigenvalue perturbations.
* Parallel Algorithms in Linear Algebra, by Robert Voigt (ICASE,
NASA Langley) pointed out that relatively few parallel
algorithms have been developed for MIMD machines, and
discussed recent work in this area. He claimed that little
progress has been made in fully analyzing parallel computation
from the point of view of both computational complexity and
communication complexity.
1. Early work mentioned included the paper on "chaotic
relaxation" by Chazan and Miranker (1969) and Stone's
paper (1973) on tridiagonal systems (a parallel method
Page 2
which is however numerically unstable.).
2. The notion of a consistent algorithm was introduced,
meaning a parallel algorithm whose scalar analogue had the
same complexity as the best known scalar (as opposed to
vector or parallel) algorithm. He gave an example of a
"cyclic elimination" method for tridiagonal systems which
was NOT consistent , since its complexity was O(n*log2(n))
but which performed very well on actual parallel machines.
The method is described in a survey paper by Heller (1978)
and was implemented by Kapur and Browne on TRAC, and
Gannon and Panetta on PRINGLE (both 1984)
3. Another basic technique which leads naturally to parallel
algorithms is the notion of "partitioning for
parallelism", or "subtructuring" in problems where the
linear system to be solved has a block structure which can
be exploited. Voigt gave an example of a block diagonal
system with nonzero off-diagonal blocks in only the last
column and last row. The basic idea was to factor the
diagonal blocks in parallel.
4. Finally, asynchronous computation, in which the order of
computations not guaranteed to be the same as some
equivalent scalar algorithm, was mentioned. A recent
thesis at Duke (Joe Saltz) investigated this approach for
the time dependent Crank-Nicholson equations. Some
problems with algorithms of this type include difficulty
with debugging, and the inability to reproduce results
(i.e. the exact same sequence of computations). Voigt
mentioned that recent work at ICASE and LANL simulating
the "chaotic iteration" approach of Chazan and Miranker
showed the method to be competitive with SOR.
Computational results for parallel algorithms seemed to be
limited to the Denelcor HEP and a few experimental machines
(mentioned previously), and simulation studies. There
appeared to be considerable interest in the Intel hypercube
but no one reported any actual computational results on this
machine. There was in general, universal disappointment at
the rapidity with which the "parallel speedup" of the HEP
peaked out due to communication bottlenecks on this machine.
Argonne and Oak Ridge were among the facilities which had had
actual computational experience on HEP's.
The flavor of a few of the more informative short
presentations is summarized below:
1. Heath (Oak Ridge) examined six possible ways of partitioning
the calculation of the Cholesky factorization of a matrix by
computing rowwise, columnwise or by submatrices, and
allocating various numbers of processes on a HEP. On a
200x200 example, the HEP peaked out at about 8 processes and
Page 3
thereafter lost ground due to communication overhead. In any
case "column Cholesky" (computing the factorization column by
column) was deemed to be best. Such a conclusion is obviously
language-dependent, although no mention of a language
(presumably Fortran) was made.
2. Geist (Oak Ridge) considered wavefront factorization of
matrices, and a dataflow approach based on this was coded for
the hypercube by Heath and Geist. They observed some
oscillations in the fine grain parallelism, but did not go
into details. They have a data flow "simulator" written in
Fortran 77, and are willing to make the source code available.
3. Bramble, Schatz, and Pasciak (Cornell) applied the idea of
preconditioning to subdomains, and assigning a processor to
each subdomain of 2-dimensional problems.
4. Yedavalli (Stevens Institute) presented some work in control
theory involving interval matrices and solving Lyapunov
equations using interval arithmetic.
5. White (N.C.State) discussed the use of parallel nonlinear
Gauss-Seidel algorithms, based on domain decomposition.
6. Saad (Yale) analyzed various communication architectures
(bus,ring,grid) for doing Gaussian elimination. He analyzed
the communication cost based on the assumptions that the
elements of the matrix were distributed equally among the
processors, and that no element resided in more than one
processor. With the model of Guassian elimination he used,
(for i:=1 to n-1 do begin move data; do arithmetic; end;) he
obtained a limit of 0.5*N**2 * (time to transfer a word on the
bus) as a lower bound on the complexity, independent of the
number of processors,for the broadcast (bus) topology and
about N**(5/3) for the grid topology. The ring was
unsatisfactory in the sense that the communication latency
dominates quickly as the number of processors increases.
The SIAM Activity Group on Linear Algebra (335 members) met
to consider future conferences, symposia, etc. Dongarra (ANL) and
Plemmons (N.C.State) suggested that an Arpanet or CSNET bulletin
board might be the best means of maintaining communications and
updating information on minisymposia, etc.
The meeting on the extension of the BLAS (basic linear
algebra subroutines ) was concerned with a preliminary description
of planned extensions to LINPACK, including naming conventions,
calls, etc. (Copies of Dongarra's full proposal may be obtained
from Janice Wilkins DTN 225-5941, ERLANG::WILKINS). It was
understood that this would leave room for individual implementors
to optimize the routines for particular architectures. This is
recognized as crucial to high performance for many routines. A
speaker from NAG cited an example where a Cholesky decomposition
routine ran at 5 Mflops on a Cray using the original Fortran, 20
Page 4
Mflops when vectorized, and 40 Mflops when partially unrolled and
vectorized.
Kahan (UCB) made the point that the routines as envisioned made no
provision for accumulating some results (e.g. inner products) in
"extended" precision, and returning such values) As such Kahan
considers this an unreasonable predisposition toward existing
architectures (i.e. non IEEE?). I think Kevin Harris would point
out that this is really a language issue.
I mentioned to Dongarra that a number of naming convention and
subroutine calling convention problems would disappear if ADA and
its overloading facilities for operators were used. I really was
asking whether a large government facility like Argonne was
"moving" to ADA in the near future. It appears that at ANL, the
introduction of DEC's ADA compiler is too recent to have produced
any noticable movement in this direction.
Finally, Lewis (NAG) discussed his preliminary work on defining
sparse BLAS. He mentioned that some processors (Cyber 205,
Hitachi, and the newer Cray XMP's) have special gather/scatter
hardware to facilitate some sparse matrix operations. He said
that it was impossible for standard Fortran to take advantage of
such hardware and still be portable, because the compiler would
have to know that a particular set of indices contained no
duplicate values.
In a recent, related article in SIGNUM, Lewis recommends that the
availability of tuned implementations of the BLAS be publicized,
perhaps via publication in TOMS, ACM, or some other publication.
Performance studies made by Dongarra and others indicate that this
is a worthwhile task on nearly every processor for which it has
been done.
Posted: Thu 13-Jun-1985 14:06 Hudson, MA
To: @SCI.DIS
T.R | Title | User | Personal Name | Date | Lines
|
---|