[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1483.0. "Statistical Data Anal in the Computer Age?" by SOLVIT::DESMARAIS () Thu Aug 22 1991 12:16

    HAS ANYONE READ THESE ARTICLE AND WOUL CARE TO COMMENT?
    
    
 Title:     Statistical Data Analysis in the Computer Age
 Author(s): Efron, Bradley; Tibshirani, Robert; Stanford Univ; Univ. of Toronto
 Journal:   Science
            v. 253, n. 5018   July 26, 1991   pp. 390-395
 Abstract:  644            JA
 Subjects:
            DATA ANALYSIS
            INFORMATION TECHNOLOGY
            MATHEMATICS
            STATISTICAL ANALYSIS

            "Most of our familiar statistical methods, such as hypothesis
            testing, linear regression, analysis of variance, and maximum
            likelihood estimation, were designed to be implemented on
            mechanical calculators.  Modern electronic computation has
            encouraged a host of new statistical methods that require fewer
            distributional assumptions than their predecessors and can be
            applied to more complicated statistical estimators.  These
            methods allow the scientist to explore and describe data and
            draw valid statistical inferences without the usual concerns for
            mathematical tractability.  This is possible because traditional
            methods of mathematical analysis are replaced by specially
            constructed computer algorithms.  Mathematics has not
            disappeared from statistical theory.  It is the main method for
            deciding which algorithms are correct and efficient tools for
            automating statistical inference.  Some promising developments in
            computer-intensive statistical methodology are described in this
            article."


 Title:     New-wave Mathematics
 Author(s): Bown, William
 Journal:   New scientist (1971)
            v. 131, n. 1780   August 1991   pp. 33-37
 Abstract:  655            JA
 Subjects:
            COMPUTATIONAL TECHNIQUES
            FRACTALS
            MATHEMATICS

            "A new generation of mathematicians is rebelling against the
            ancient tradition of theorem and proof.  New-wave mathematicians
            prefer to experiment with free thinking on a computer.  But
            traditionalists fear that they may be about to lose something
            special."  By giving mathematicians the ability to do billions
            of complicated calculations on their own desks, the computer has
            spawned a whole new way of doing mathematics known as
            experimental maths.  Instead of deducing proofs step by step,  
            these experimental mathematicians gain knowledge in the same
            inductive way as most other scientists. While scientists design
            experiments on parts of the real world, the new mathematicians
            experiment by looking for patterns in abstract worlds existing
            only inside a computer.  Deep down, all mathematics follows the
            example of Euclid, being founded on a few axioms -- the basic
            rules that define the area of study.  Now, Euclid's proofs, if
            they come into it at all, will  probably be someone else's job. 
            some mathematicians are just too busy experimenting.

T.R	Title	User	Personal Name	Date	Lines
1483.1	a preliminary question	CSSE::NEILSEN	Wally Neilsen-Steinhardt	`Thu Aug 22 1991 16:46`	4
	.0> HAS ANYONE READ THESE ARTICLE AND WOUL CARE TO COMMENT? Is that a logical 'and' in the sentence above? Do I have to read the article before I comment?
1483.2	Disappointing.	CADSYS::COOPER	Topher Cooper	`Thu Aug 22 1991 18:32`	80
	I have read the first article. I have the second sitting next to my bed waiting to be read. I'll get back to you on that one. I found the first article rather disappointing. The area of "computer intensive statistical methods" is an important one. The article unfortunately pushed rather too strongly one of the most glamorous but least useful of those techniques -- the bootstrap which was invented by Efron. Someday the bootstrap (and its relative "the jacknife) may be a broadly useful technique, but right now nobody knows when its results can be considered valid and when they can't except for some special cases. If you get an answer via the bootstrap you are generally open to the criticism that the assumptions that it is based on may not apply (since no one knows when they apply and when they don't) and that your results therefore don't mean anything. There was a rather downplayed reference to the problem but it was essentially dismissed. ("Theoretical work on properties of the bootstrap is proceeding at a vigorous pace. We have empahsized standard errors here, but the main theoretical thrust has been toward confidence intervals. Getting dependable confidence intervals from bootstrap calculations is challenging, in theory and in practice, but progress on both fronts has been considerable"). In fact the primary problem which needs to be addressed with bootstrap techniques applies to any application of them. The bootstrap assumes that sample taken from the population is "representative" of that population in a very general way. Clearly this is true when the sample is large enough, but just what is large enough? If the population distribution is simple and well-behaved, (e.g., normal) the answer is probably that a fairly small sample is good enough. But what if the distribution is in some sense very lumpy (even fractal) in character and the statistic you are bootstrapping is sensitive to that lumpiness? You obviously need enough samples so that the "lumps" are reflected in the sample distribution. How do you decide what is a good enough sample without making the kind of prior assumptions about the population that the bootstrap is supposed to allow you to avoid. Generally the amount of analysis required to justify the bootstrap in any particular case is greater than the amount needed to apply other techniques (both conventional and computer intensive). The article then goes on to discuss "nonparametric regression" which is just a fancy name for data smoothing. The technique is virtually useless for formal statistical inference, and needs to be used very carefully for informal statistical inference. It is highly useful for showing underlying structure of data, but may create the structure it shows. It is therefore handy for descriptive purposes (data reduction), for exploratory data analysis and for presentation, but Efron and Tibshirani don't make clear its limitations. (Reading this section did, however, make me think of a computer intensive method which could more legitimately be called "nonparametric regression". I wonder if anyone else has thought of it? I'm going to have to give it some more thought to see if it is worth pursuing). The next technique discussed was "generalized additive models". It is not a technique I'm familiar with. It seemed interesting, but there really was not enough information to make much of a judgement. There was a reference to a book (co-authored by Tibshirani) on the subject which I may look up, but the account in the article was much to brief to be useful. If I wasn't interested in new statistical techniques for there own sakes, I would probably not bother to check the reference. There was not enough information to allow a reader to decide whether the technique even might solve a problem that she was facing as a scientist or engineer. They then discuss, in somewhat more detail, a kind of clustering algorithm called CART (Classification and Regression Trees). The technique was interesting but obviously suffers (as all known clustering algorithms suffer) from making rather strong assumptions. The article was rather unclear about when (or if) this technique is more useful than any other classification/clustering technique (almost all of which might be called "computer intensive"). It seems to me that anyone reading this who knew statistics but who didn't know much about these "unconventional" techniques would be left saying "interesting but so what?" As I said, disappointing. Topher
1483.3	More reaction	CORREO::BELDIN_R	Pull us together, not apart	`Thu Aug 22 1991 19:43`	17
	I agree with .2 on the Jackknife and Bootstrap. "generalized additive models" is a chapter in experimental design which shows that linear models are more robust than the normality assumptions usually used to develop the theory would suggest. Efron was very active 25 years ago in the same areas. I won't bother to include my speculative opinions of what that implies. ... >(Reading this >section did, however, make me think of a computer intensive method >which could more legitimately be called "nonparametric regression". ... Let me think about this one. Dick
1483.4	More on non-parametric regression.	CADSYS::COOPER	Topher Cooper	`Fri Aug 23 1991 18:19`	39
	RE: .3 (Dick) > >(Reading this > >section did, however, make me think of a computer intensive method > >which could more legitimately be called "nonparametric regression". > ... > > Let me think about this one. I should clarify a bit. Traditional regression is a bunch of different techniques which are used for a number of different purposes. One of those purposes is to fit a "summarizing" curve to a set of numeric data. There are many non-regression -- indeed, non- "statistical" -- techniques for this, so I don't really feel that this is essentially "regression". This is what "loess," the technique in the article, does. Another use of regression -- in some ways the primary one -- is to estimate the values of the parameters of a stochastic numeric model. Any technique which did this, I think, could legitimately be called an extension to regression. Its a bit hard to imagine how any technique for estimating parameters could be considered "non-parametric", though. Some genetic algorithm techniques which have been used (which "breed" arbitrary formulae to fit data) come close, but hidden inside there is a "parametric" model implicit in the fitness criteria. That leaves, of the major uses of regression that I can think of, only using regression to answer the question as to whether or not there exists a relationship between numeric variables. The article didn't directly suggest any way to do this "non-parametrically", but it did lead me to ask the question as to whether such a method existed. Since asking the right question is often the main part of answering it, the raw outline of a possible procedure occurred to me. Chances are someone else has already thought of it, and it still needs a lot of fleshing out, but its fun to think about such things. Topher