[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1483.0. "Statistical Data Anal in the Computer Age?" by SOLVIT::DESMARAIS () Thu Aug 22 1991 12:16

    HAS ANYONE READ THESE ARTICLE AND WOUL CARE TO COMMENT?
    
    
 Title:     Statistical Data Analysis in the Computer Age
 Author(s): Efron, Bradley; Tibshirani, Robert; Stanford Univ; Univ. of Toronto
 Journal:   Science
            v. 253, n. 5018   July 26, 1991   pp. 390-395
 Abstract:  644            JA
 Subjects:
            DATA ANALYSIS
            INFORMATION TECHNOLOGY
            MATHEMATICS
            STATISTICAL ANALYSIS

            "Most of our familiar statistical methods, such as hypothesis
            testing, linear regression, analysis of variance, and maximum
            likelihood estimation, were designed to be implemented on
            mechanical calculators.  Modern electronic computation has
            encouraged a host of new statistical methods that require fewer
            distributional assumptions than their predecessors and can be
            applied to more complicated statistical estimators.  These
            methods allow the scientist to explore and describe data and
            draw valid statistical inferences without the usual concerns for
            mathematical tractability.  This is possible because traditional
            methods of mathematical analysis are replaced by specially
            constructed computer algorithms.  Mathematics has not
            disappeared from statistical theory.  It is the main method for
            deciding which algorithms are correct and efficient tools for
            automating statistical inference.  Some promising developments in
            computer-intensive statistical methodology are described in this
            article."


 Title:     New-wave Mathematics
 Author(s): Bown, William
 Journal:   New scientist (1971)
            v. 131, n. 1780   August 1991   pp. 33-37
 Abstract:  655            JA
 Subjects:
            COMPUTATIONAL TECHNIQUES
            FRACTALS
            MATHEMATICS

            "A new generation of mathematicians is rebelling against the
            ancient tradition of theorem and proof.  New-wave mathematicians
            prefer to experiment with free thinking on a computer.  But
            traditionalists fear that they may be about to lose something
            special."  By giving mathematicians the ability to do billions
            of complicated calculations on their own desks, the computer has
            spawned a whole new way of doing mathematics known as
            experimental maths.  Instead of deducing proofs step by step,  
            these experimental mathematicians gain knowledge in the same
            inductive way as most other scientists. While scientists design
            experiments on parts of the real world, the new mathematicians
            experiment by looking for patterns in abstract worlds existing
            only inside a computer.  Deep down, all mathematics follows the
            example of Euclid, being founded on a few axioms -- the basic
            rules that define the area of study.  Now, Euclid's proofs, if
            they come into it at all, will  probably be someone else's job. 
            some mathematicians are just too busy experimenting.
T.RTitleUserPersonal
Name
DateLines
1483.1a preliminary questionCSSE::NEILSENWally Neilsen-SteinhardtThu Aug 22 1991 16:464
.0>    HAS ANYONE READ THESE ARTICLE AND WOUL CARE TO COMMENT?

Is that a logical 'and' in the sentence above?  Do I have to read the article
before I comment?
1483.2Disappointing.CADSYS::COOPERTopher CooperThu Aug 22 1991 18:3280
    I have read the first article.  I have the second sitting next to my
    bed waiting to be read.  I'll get back to you on that one.

    I found the first article rather disappointing.  The area of "computer
    intensive statistical methods" is an important one.  The article
    unfortunately pushed rather too strongly one of the most glamorous but
    least useful of those techniques -- the bootstrap which was invented by
    Efron.  Someday the bootstrap (and its relative "the jacknife) may be a
    broadly useful technique, but right now nobody knows when its results
    can be considered valid and when they can't except for some special
    cases.  If you get an answer via the bootstrap you are generally open
    to the criticism that the assumptions that it is based on may not apply
    (since no one knows when they apply and when they don't) and that your
    results therefore don't mean anything.

    There was a rather downplayed reference to the problem but it was
    essentially dismissed.  ("Theoretical work on properties of the
    bootstrap is proceeding at a vigorous pace.  We have empahsized
    standard errors here, but the main theoretical thrust has been  toward
    confidence intervals.  Getting dependable  confidence intervals from
    bootstrap calculations is challenging, in theory and in practice, but
    progress on both fronts has been considerable").  In fact the primary
    problem which needs to be addressed with bootstrap techniques applies
    to any application of them.

    The bootstrap assumes that sample taken from the population is
    "representative" of that population in a very general way.  Clearly
    this is true when the sample is large enough, but just what is large
    enough?  If the population distribution is simple and well-behaved,
    (e.g., normal) the answer is probably that a fairly small sample is
    good enough.  But what if the distribution is in some sense very lumpy
    (even fractal) in character and the statistic you are bootstrapping is
    sensitive to that lumpiness?  You obviously need enough samples so that
    the "lumps" are reflected in the sample distribution.  How do you
    decide what is a good enough sample without making the kind of
    prior assumptions about the population that the bootstrap is supposed
    to allow you to avoid.  Generally the amount of analysis required to
    justify the bootstrap in any particular case is greater than the
    amount needed to apply other techniques (both conventional and computer
    intensive).

    The article then goes on to discuss "nonparametric regression" which
    is just a fancy name for data smoothing.  The technique is virtually
    useless for formal statistical inference, and needs to be used very
    carefully for informal statistical inference.  It is highly useful
    for showing underlying structure of data, but may create the structure
    it shows.  It is therefore handy for descriptive purposes (data
    reduction), for exploratory data analysis and for presentation, but
    Efron and Tibshirani don't make clear its limitations.  (Reading this
    section did, however, make me think of a computer intensive method
    which could more legitimately be called "nonparametric regression".
    I wonder if anyone else has thought of it?  I'm going to have to give
    it some more thought to see if it is worth pursuing).

    The next technique discussed was "generalized additive models".  It is
    not a technique I'm familiar with.  It seemed interesting, but there
    really was not enough information to make much of a judgement.  There
    was a reference to a book (co-authored by Tibshirani) on the subject
    which I may look up, but the account in the article was much to brief
    to be useful.  If I wasn't interested in new statistical techniques for
    there own sakes, I would probably not bother to check the reference.
    There was not enough information to allow a reader to decide whether
    the technique even might solve a problem that she was facing as a
    scientist or engineer.

    They then discuss, in somewhat more detail, a kind of clustering
    algorithm called CART (Classification and Regression Trees).  The
    technique was interesting but obviously suffers (as all known
    clustering algorithms suffer) from making rather strong assumptions.
    The article was rather unclear about when (or if) this technique
    is more useful than any other classification/clustering technique
    (almost all of which might be called "computer intensive").

    It seems to me that anyone reading this who knew statistics but who
    didn't know much about these "unconventional" techniques would be left
    saying "interesting but so what?"

    As I said, disappointing.

				    Topher
1483.3More reactionCORREO::BELDIN_RPull us together, not apartThu Aug 22 1991 19:4317
    I agree with .2 on the Jackknife and Bootstrap.
    
    "generalized additive models" is a chapter in experimental design 
    which shows that linear models are more robust than the normality
    assumptions usually used to develop the theory would suggest.
    
    Efron was very active 25 years ago in the same areas.  I won't bother
    to include my speculative opinions of what that implies.
    
... >(Reading this
    >section did, however, make me think of a computer intensive method
    >which could more legitimately be called "nonparametric regression".
    ...
    
    Let me think about this one.
    
    Dick
1483.4More on non-parametric regression.CADSYS::COOPERTopher CooperFri Aug 23 1991 18:1939
RE: .3 (Dick)
>    >(Reading this
>    >section did, however, make me think of a computer intensive method
>    >which could more legitimately be called "nonparametric regression".
>    ...
>    
>    Let me think about this one.

    I should clarify a bit.

    Traditional regression is a bunch of different techniques which are
    used for a number of different purposes.

    One of those purposes is to fit a "summarizing" curve to a set of
    numeric data.  There are many non-regression -- indeed, non-
    "statistical" -- techniques for this, so I don't really feel that this
    is essentially "regression". This is what "loess," the technique in the
    article, does.

    Another use of regression -- in some ways the primary one -- is to
    estimate the values of the parameters of a stochastic numeric model.
    Any technique which did this, I think, could legitimately be called an
    extension to regression.  Its a bit hard to imagine how any technique
    for estimating parameters could be considered "non-parametric", though. 
    Some genetic algorithm techniques which have been used (which "breed"
    arbitrary formulae to fit data) come close, but hidden inside there is
    a "parametric" model implicit in the fitness criteria.

    That leaves, of the major uses of regression that I can think of, only
    using regression to answer the question as to whether or not there
    exists a relationship between numeric variables.  The article didn't
    directly suggest any way to do this "non-parametrically", but it did
    lead me to ask the question as to whether such a method existed.  Since
    asking the right question is often the main part of answering it, the
    raw outline of a possible procedure occurred to me.  Chances are
    someone else has already thought of it, and it still needs a lot of
    fleshing out, but its fun to think about such things.

					Topher