[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]
Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613
306.0. "SIAM Trip Report" by TOOLS::STAN () Thu Jun 13 1985 22:17

From:	GAUSS::ANDERSON     13-JUN-1985 14:57
To:	HARE::RABINOWITZ
Subj:	SIAM Conference on Applied Linear Algebra - trip report




    +---+---+---+---+---+---+---+
    |   |   |   |   |   |   |   |
    | d | i | g | i | t | a | l |    I N T E R O F F I C E  M E M O
    |   |   |   |   |   |   |   |
    +---+---+---+---+---+---+---+
                                        From: Ned Anderson
    To: @SCI                            Date: May 29, 1985
                                        Dept: Eastern Research Lab
                                        DTN:  225-5969
                                        Loc:  HLO2-3/N04
                                        Enet: GAUSS:ANDERSON


         Subject:  Report on the Second  SIAM  Conference  on  Applied
    Linear Algebra, April 29 - May 2 1985, Raleigh NC.

         This was a major conference, with  8  invited  presentations,
    several   minisymposia,   and  close  to  100  shorter  (20  min.)
    presentations in parallel session format.  I  attended  the  first
    three  days,  which included several interesting invited (60 min.)
    presentations.  The shorter talks were in  the  areas  of  applied
    linear algebra, numerical linear algebra, and parallel algorithms.
    There were several  sessions,  formal  and  informal,  devoted  to
    discussion   of   extending   the   BLAS   (basic  linear  algebra
    subroutines)  of  LINPACK.   A  number  of  purveyors  of  various
    mathematical software packages for "the PC" were also on hand.

         Invited presentations attended:

     *  Matrix Polynomials,  by  Peter  Lancaster  discussed  spectral
        theory and various canonical forms for matrix polynomials.

     *  Matrix  Methods  in  the   Numerical   solution   of   Partial
        Differential  Equations,  by  Seymour  Parter,  discussed  the
        classical  notions  of  stability  and  convergence,  and  the
        importance   of  some  eigenvalue  estimation  techniques  for
        studying these properties.  He focused  on  some  problems  of
        current  interest  in  difference  and  multigrid  methods for
        partial differential equations.

     *  Numerical and Computational  Methods  in  Linear  Algebra,  by
        James   Wilkinson,   discussed   ill-conditioning,   defective
        matrices and eigenvalue perturbations.

     *  Parallel Algorithms in Linear Algebra, by Robert Voigt (ICASE,
        NASA   Langley)  pointed  out  that  relatively  few  parallel
        algorithms  have  been  developed  for  MIMD   machines,   and
        discussed  recent  work  in this area.  He claimed that little
        progress has been made in fully analyzing parallel computation
        from  the  point  of view of both computational complexity and
        communication complexity.

        1.  Early  work  mentioned  included  the  paper  on  "chaotic
            relaxation"  by  Chazan  and  Miranker  (1969) and Stone's
            paper (1973) on tridiagonal  systems  (a  parallel  method

                                                                Page 2


            which is however numerically unstable.).

        2.  The notion  of  a  consistent  algorithm  was  introduced,
            meaning a parallel algorithm whose scalar analogue had the
            same complexity as the best known scalar  (as  opposed  to
            vector  or  parallel)  algorithm.  He gave an example of a
            "cyclic elimination" method for tridiagonal systems  which
            was NOT consistent , since its complexity was O(n*log2(n))
            but which performed very well on actual parallel machines.
            The method is described in a survey paper by Heller (1978)
            and was implemented by  Kapur  and  Browne  on  TRAC,  and
            Gannon and Panetta on PRINGLE (both 1984)

        3.  Another basic technique which leads naturally to  parallel
            algorithms    is   the   notion   of   "partitioning   for
            parallelism", or "subtructuring"  in  problems  where  the
            linear system to be solved has a block structure which can
            be exploited.  Voigt gave an example of a  block  diagonal
            system  with  nonzero off-diagonal blocks in only the last
            column and last row.  The basic idea  was  to  factor  the
            diagonal blocks in parallel.

        4.  Finally, asynchronous computation, in which the  order  of
            computations  not  guaranteed  to  be  the  same  as  some
            equivalent scalar  algorithm,  was  mentioned.   A  recent
            thesis  at Duke (Joe Saltz) investigated this approach for
            the  time  dependent  Crank-Nicholson   equations.    Some
            problems  with  algorithms of this type include difficulty
            with debugging, and the  inability  to  reproduce  results
            (i.e.   the  exact  same sequence of computations).  Voigt
            mentioned that recent work at ICASE  and  LANL  simulating
            the  "chaotic  iteration"  approach of Chazan and Miranker
            showed the method to be competitive with SOR.

        Computational results for parallel  algorithms  seemed  to  be
        limited  to  the  Denelcor HEP and a few experimental machines
        (mentioned  previously),  and   simulation   studies.    There
        appeared  to  be  considerable interest in the Intel hypercube
        but no one reported any actual computational results  on  this
        machine.   There  was  in general, universal disappointment at
        the rapidity with which the  "parallel  speedup"  of  the  HEP
        peaked  out  due to communication bottlenecks on this machine.
        Argonne and Oak Ridge were among the facilities which had  had
        actual computational experience on HEP's.



         The  flavor  of  a  few  of  the   more   informative   short
    presentations is summarized below:

    1.  Heath (Oak Ridge) examined six possible ways  of  partitioning
        the  calculation  of the Cholesky factorization of a matrix by
        computing  rowwise,  columnwise   or   by   submatrices,   and
        allocating  various  numbers  of  processes  on  a  HEP.  On a
        200x200 example, the HEP peaked out at about 8  processes  and

                                                                Page 3


        thereafter  lost ground due to communication overhead.  In any
        case "column Cholesky" (computing the factorization column  by
        column) was deemed to be best.  Such a conclusion is obviously
        language-dependent,  although  no  mention   of   a   language
        (presumably Fortran) was made.

    2.  Geist  (Oak  Ridge)  considered  wavefront  factorization   of
        matrices,  and a dataflow approach based on this was coded for
        the  hypercube  by  Heath  and  Geist.   They  observed   some
        oscillations  in  the  fine  grain parallelism, but did not go
        into details.  They have a data flow  "simulator"  written  in
        Fortran 77, and are willing to make the source code available.

    3.  Bramble, Schatz, and Pasciak (Cornell)  applied  the  idea  of
        preconditioning  to  subdomains,  and assigning a processor to
        each subdomain of 2-dimensional problems.

    4.  Yedavalli (Stevens Institute) presented some work  in  control
        theory   involving  interval  matrices  and  solving  Lyapunov
        equations using interval arithmetic.

    5.  White (N.C.State) discussed  the  use  of  parallel  nonlinear
        Gauss-Seidel algorithms, based on domain decomposition.

    6.  Saad  (Yale)  analyzed  various  communication   architectures
        (bus,ring,grid)  for  doing Gaussian elimination.  He analyzed
        the communication cost  based  on  the  assumptions  that  the
        elements  of  the  matrix  were  distributed equally among the
        processors, and that no  element  resided  in  more  than  one
        processor.   With  the  model of Guassian elimination he used,
        (for i:=1 to n-1 do begin move data; do arithmetic;  end;)  he
        obtained a limit of 0.5*N**2 * (time to transfer a word on the
        bus) as a lower bound on the complexity,  independent  of  the
        number  of  processors,for  the  broadcast  (bus) topology and
        about  N**(5/3)  for  the  grid  topology.    The   ring   was
        unsatisfactory  in  the  sense  that the communication latency
        dominates quickly as the number of processors increases.


         The SIAM Activity Group on Linear Algebra (335  members)  met
    to consider future conferences, symposia, etc.  Dongarra (ANL) and
    Plemmons (N.C.State) suggested that an Arpanet or  CSNET  bulletin
    board  might  be  the best means of maintaining communications and
    updating information on minisymposia, etc.

         The meeting on  the  extension  of  the  BLAS  (basic  linear
    algebra subroutines ) was concerned with a preliminary description
    of planned extensions to LINPACK,  including  naming  conventions,
    calls,  etc.   (Copies of Dongarra's full proposal may be obtained
    from  Janice  Wilkins  DTN  225-5941,  ERLANG::WILKINS).   It  was
    understood  that this would leave room for individual implementors
    to optimize the routines for particular  architectures.   This  is
    recognized  as  crucial  to high performance for many routines.  A
    speaker from NAG cited an example where a  Cholesky  decomposition
    routine  ran  at 5 Mflops on a Cray using the original Fortran, 20

                                                                Page 4


    Mflops when vectorized, and 40 Mflops when partially unrolled  and
    vectorized.

    Kahan (UCB) made the point that the routines as envisioned made no
    provision  for accumulating some results (e.g.  inner products) in
    "extended" precision, and returning such  values)  As  such  Kahan
    considers  this  an  unreasonable  predisposition  toward existing
    architectures (i.e.  non IEEE?).  I think Kevin Harris would point
    out that this is really a language issue.

    I mentioned to Dongarra that a number  of  naming  convention  and
    subroutine  calling convention problems would disappear if ADA and
    its overloading facilities for operators were used.  I really  was
    asking  whether  a  large  government  facility  like  Argonne was
    "moving" to ADA in the near future.  It appears that at  ANL,  the
    introduction  of DEC's ADA compiler is too recent to have produced
    any noticable movement in this direction.

    Finally, Lewis (NAG) discussed his preliminary  work  on  defining
    sparse  BLAS.   He  mentioned  that  some  processors  (Cyber 205,
    Hitachi, and the newer Cray  XMP's)  have  special  gather/scatter
    hardware  to  facilitate  some  sparse matrix operations.  He said
    that it was impossible for standard Fortran to take  advantage  of
    such  hardware  and  still be portable, because the compiler would
    have to know  that  a  particular  set  of  indices  contained  no
    duplicate values.

    In a recent, related article in SIGNUM, Lewis recommends that  the
    availability  of  tuned implementations of the BLAS be publicized,
    perhaps via publication in TOMS, ACM, or some  other  publication.
    Performance studies made by Dongarra and others indicate that this
    is a worthwhile task on nearly every processor for  which  it  has
    been done.

Posted:	Thu 13-Jun-1985 14:06 Hudson, MA
To:	@SCI.DIS
T.R	Title	User	Personal Name	Date	Lines