[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1145.0. "Chi-Square X^2 calculation (statistics)" by MILKWY::JANZEN (cf. ANT::CIRCUITS,ANT::UWAVES) Mon Oct 30 1989 18:36

    Hi.
    I don't understand my reference books for engineers, and I sold back my
    statistics book.
    How do you calculate Chi Squared (X^2) test of correlation, and how do you
    choose whether to use .01 or .05 to make a binary decision about
    correlation????  I have picked up that chi-square is a unitless test of
    correlation commonly performed in all branches of science to find
    correlation between factors, leading to studies of mechanisms 
    of causality in a system.  I guess it might be the quotient of the sums
    of the errors divided by the variance.
    I have two measurement systems.  I want to quantify how well their
    respective measurements mutually correlate.
    I make n measurements on each system of the same thing.
    One sequence is x(n), and the other from the other measurement system
    is y(n).
    
    How do I calculate the unitless correlation? 
    Thanks
    Tom (who will hit the books tonight and try again to decode them)
T.RTitleUserPersonal
Name
DateLines
1145.1AITG::DERAMOlike a candle in the windTue Oct 31 1989 02:0340
        I thought that the Chi Squared test was for "goodness of
        fit" as opposed to correlation.  You suspect an
        experiment can have one of k results, with probabilities
        p[1] through p[k].  You perform N independent trials, and
        observe each of the k results Obs[i] times, 1 <= i <= k.
        The expected value of the number of times result i
        occurred is Exp[i] = N * p[i].  The Chi Squared statistic
        for this is the sum over 1 <= i <= k of
        (Obs[i] - Exp[i])^2 / Exp[i].
        
        As N -> oo, the value computed for this statistic will
        approach a distribution known as Chi Squared with k-1
        degrees of freedom (the distribution is that of the sum
        of k-1 independent normally distributed random values) if
        the assumptions were correct.  If the true probabilities
        weren't the p[i], or the trials weren't independent, then
        the computed statistic tends to differ more and more from
        this as N increases.
        
        You can use this to show that two things are "correlated"
        by assuming they are independent and doing a Chi Squared
        test that "fails", i.e., is so out of range that you feel
        safe in concluding the two factors are not independent.
        
        The above described a one dimensional test.  You can set
        up a two dimenionsal test as follows.  One dimension is
        one of k1 classes with probabilities p[i].  The other is
        one of k2 classes with probabilities q[i].  Now "cell i,j"
        for 1 <= i <= k1, 1 <= j <= k2, has observed count
        Obs[i,j] and expected count N * p[i] * q[j] after N
        independent trials.  Again compute the sum over all of
        the cells of (Obs - Exp)^2 / Exp.  In this case, if the
        assumptions are correct, then as N -> oo the computed
        statistic has a Chi Squared distribution with (k1 - 1)*(k2 - 1)
        degrees of freedom.  [One of the "assumptions" is that
        the class of the first factor is independent of the class
        of the second factor.  If the test shows they are not
        independent then in some sense they are "correlated".]
        
        Dan
1145.2I happen to have this problem ...VMSDEV::HALLYBThe Smart Money was on GoliathWed May 22 1991 16:395
    OK, I have 80 "bins" that should uniformly share 281 samples.
    I calculate the chi-square value as 57.5
    Tables only go to 30 or so, but it looks like I have 79 degrees O'freedom.
    
    What next?
1145.3GUESS::DERAMOBe excellent to each other.Wed May 22 1991 16:5934
	re .1
        
>>        As N -> oo, the value computed for this statistic will
>>        approach a distribution known as Chi Squared with k-1
>>        degrees of freedom (the distribution is that of the sum
>>        of k-1 independent normally distributed random values) if
>>        the assumptions were correct.
        
	Eeeek!  Make that, the distribution is that of the sum of
        the squares of k-1 independent normally distributed
        random variables.
        
        One thing I left out of .1 is that it is suggested each
        bin have a minimum expected value of at least 5.  If you
        don't have enough independent trials for that, then
        combine bins or get more independent trials.
        
	re .2
        
>>    OK, I have 80 "bins" that should uniformly share 281 samples.

        Ummm, gee, you should combine bins or use more samples. :-)
        
>>    Tables only go to 30 or so, but it looks like I have 79 degrees O'freedom.
        
        I thought the tables usually gave a formula for computing
        the levels at higher degrees of freedom.  If so, then
        plug in to the formula.  If not, then a Chi squared with
        f degrees of freedom has mean f and variance 2f
        [variance, not standard deviation] and for large f can
        itself be approximated by a normal distribution with
        those parameters.  I don't know if 79 is large enough.
        
        Dan
1145.4formula in Knuth v2, 2ed.TOOK::CBRADLEYChuck BradleyThu May 23 1991 12:427
        
>>    Tables only go to 30 or so, but it looks like I have 79 degrees O'freedom.

i had a similar problem last year.  the tables i consulted did not have
a formula for large v.  I finally found one in Knuth v2, in the section on
testing random number generators.  Beware, the formula in 1st ed. is wrong.
Get the latest edition.
1145.5Formulas.CADSYS::COOPERTopher CooperThu May 23 1991 18:2122
    The usual approximation given in tables of chi-square for df>30 is
    that

	    sqrt(2*X) - sqrt(2*df - 1)

    is approximately normally distributed.  This will give you about two
    decimal places of accuracy for df>30 and p<.995.  A better
    approximation (especially if you are going to put it in code) is that:

	          1/3
	    (X/df)    - (1 - 2/(9df))
	   ---------------------------
		   sqrt(2/(9df))

    is also approximately normally distributed.  This will give you about
    4 places of accuracy under the same conditions (yes, I looked it up).

    The straight approximation of (X-df)/df is not terribly useful.  With
    79 degrees of freedom, the .99 alpha point (i.e., the correct answer
    should be .99) comes out as .65 -- a tad off.

					Topher
1145.6from some other tablesCSSE::NEILSENWally Neilsen-SteinhardtTue May 28 1991 15:1712
.2>    What next?

Find some new tables.  Winkler and Hays, oft cited by me, includes a table that 
goes up to 100, in steps of 10.  

For n=80, 57.2 corresponds to a fractile of 0.025.
	  60.4            "                 0.05


As .3 says, you have too few samples to satisfy the usual rule of thumb of
5 predicted samples per bin.

1145.7X^2 for uniform distribution problemVMSDEV::HALLYBFish have no concept of fireMon Nov 25 1991 16:4018
    Let me go back to one of the questions in .0 -- how do you "choose"
    whether to use .01 or .05?  If the answer is "subjectively", might
    there be some sort of reference work of past case studies where tests
    are described and the authors say "We picked a confidence level of 95%
    because ..." ?
    
    Is it correct to say the choice must be made before the result is known?
    In some sense isn't it "cheating" to look up the confidence level before
    deciding what level to use, thereby allowing one to choose the largest
    successful level?
    
    If so, can we carry this one step further as follows:  given 5 "bins"
    and an expectation of 80 per bin, I observe one bin comes in at 45.
    Without reference to the tables I'm already influenced by that datum
    and am tempted to use, say, 99.9% as my confidence level.  Is that
    "cheating"?
    
      John
1145.8COOKIE::PBERGHPeter Bergh, DTN 523-3007Mon Nov 25 1991 18:2319
      <<< Note 1145.7 by VMSDEV::HALLYB "Fish have no concept of fire" >>>
                   -< X^2 for uniform distribution problem >-

>>    Let me go back to one of the questions in .0 -- how do you "choose"
>>    whether to use .01 or .05?

The confidence level is an estimate of the probability of being wrong if we
reject the null hypothesis (i.e., in the chase of X^2, "the same distribution
as ...").  Thus, you don't choose; your data "tell" you.

>>    Is it correct to say the choice must be made before the result is known?

No; the data tell you what the confidence level is.

When using the results of the test, however, you have to decide what risk of
being wrong you're willing to accept.  Thus, if some important decision hinges
on the outcome of the test, you'll want a high level of confidence (i.e., a low
probability of being wrong).  What risk you're willing to accept is outside the
realm of statistics.
1145.9confindence and significance.CADSYS::COOPERTopher CooperMon Nov 25 1991 19:4663
RE: .7, .8

    There is some terminological confusion level here -- not surprising
    because in informal discussion terms which are technically quite
    distinct are used interchangeably.

    "Confidence" levels refer to "confidence intervals" which is part of
    parameter estimation rather than to hypothesis testing.

    When one performs a statistical test one calculates a "p-value" which
    is sometimes refered to as the "significance" of the test applied to
    the data.  Informally one might even refer to it as the "significance
    level" of the test applied to the data.

    Some contemporary statisticians recommend that you stop there -- that
    the end point of the statistical procedure is a p-value, and what
    follows is interpretation.  This is refered to as "signficance
    testing".  There is some justice to this, but also a sort of
    indefiniteness to the results.

    The more traditional approach is "hypothesis testing".  One selects in
    advance -- before even seeing the data -- a particular significance
    criterion, or alpha, or sometimes significance level.  If the p-value
    is less than that criterion then the results are declared "significant"
    which means, more or less, "we can consider them as real rather than
    as a chance fluctuation".  Frequently, a result will be reported with
    the criterion expressed explicitly, in a phrase similar to "the results
    are significant at the .05 level".  Sometimes, in practice more than
    one criterion will be used, such as:

		    p>.1    non-significant
		.1=>p>.05   suggestive
		.05=>p>.01  significant
		.01=>p	    highly significant

    Labels other than "significant" are sometimes intended only informally.
    E.g., a suggestive test is formally insignificant, but there is an
    indication to potential replicators that perhaps there is an effect
    which just requires a somewhat larger sample size.  "Significant" and
    "highly significant" both mean "significant", but "highly significant"
    is less likely to turn out to just be a weird coincidence upon
    replication.

    The (formal) significance criterion is part of the test.  In theory one
    should no more select it after one has performed the test or even seen
    the data than one should select any other aspect of the statistical
    test on the basis of the results.  Anything else is indeed "cheating"
    (in some contexts the quotes are not needed).

    In practice, in the social sciences, the way the criterion is selected
    is: its .05 unless there is a perception of some risk -- real or
    intellectual -- from a false rejection of the null hypothesis, in which
    case the .01 level is used.  For example, in parapsychology, until
    relatively recently, the standard criterion used was .01 and
    occasionally higher (i.e., smaller).  It was realized, however, that
    since even the staunchest (knowledgable) critics accepted that the
    anomaly was statistically significant (the argument was and is whether
    the statistical anomaly was/is attributable to "conventional" causes)
    this was probably counterproductive.  Useful data for resolving the
    questions about what is going on was being thrown out.  Now the .05
    level is generally used in parapsychological research.

					Topher
1145.10other complicationsPULPO::BELDIN_RPull us together, not apartTue Nov 26 1991 14:2528
re all

Just to expand on Topher's comments, in particular in reference to the
Chi-squared tests.

I use the plural because the probability distribution which gives this kind
of test its name can arise in many, many different kinds of experiments
which are logically unrelated.  One can use the Chi-squared distribution for
many tests, just as you can use dice to play monopoly, backgammon, or to
shoot craps.

It is rarely clear from the title "Chi-squared test" what the writer means.
Some applications are very gross approximations, others have some level of
robustness, and others can be shown to follow the Chi-squared distribution
exactly given some assumptions.

As suggested by others, some statisticians demand that you have chosen the
test procedure completely before collecting any data.  This may include
everything right down to the text of the summary with a zero-one variable
used to decide which text you include and triggered by the (ultimate) result
of your observations.

In summary,  there are no unanimous doctrines in statistics.  We are all
free to make our own mistakes.  :-)

Dick


1145.11Only one "real" chi-square test.CADSYS::COOPERTopher CooperTue Nov 26 1991 15:1927
RE: .10 (Dick)

    You have a point, but I don't entirely agree.  There are a number of
    quite distinct things which may be referred to as "a chi-square test",
    because they make use of the chi-square distribution.  For example,
    a couple of weeks ago I was comparing a single sample variance to
    an expected value using "a chi-square test".

    But there is only one chi-square test which is generally called that
    with qualifier or warning -- the chi-square frequency goodness-of-fit
    test.  It is an extremely powerful, flexible, and in its general form,
    at times tricky to apply correctly, but it is essentially a single
    test.  It is used when you have a model of a situation which makes a
    prediction about the frequency of independent events, and which you
    wish to compare to some observed frequencies.  One common use of this
    is with models which assume the independence of two variables, and
    so the "chi-square test of independence" is sometimes treated as a
    separate test -- but it is, basically the same test.  The trick to
    applying it in atypical situations is being sure that the frequencies
    *are* independent (i.e., that a chance fluctuation in one frequency
    cell does not produce a similar or opposite fluctuation in another
    except as accounted for in your model) and that you have the correct
    number of degrees of freedom (i.e., that you have properly encorporated
    into the test the degree to which the model forces the data to
    conform).

					Topher
1145.12further discussion of choosing a level of significanceCSSE::NEILSENWally Neilsen-SteinhardtTue Nov 26 1991 15:4837
I agree with Topher in .9 and disagree with Peter in .8.

But there is a little more that can be said.

.7>    Let me go back to one of the questions in .0 -- how do you "choose"
>    whether to use .01 or .05?  If the answer is "subjectively", might
>    there be some sort of reference work of past case studies where tests
>    are described and the authors say "We picked a confidence level of 95%
>    because ..." ?

You would probably have to look pretty far to find this kind of statement.
Mostly the level of confidence is chosen by a social convention: all the
researchers in a field use 0.01 or 0.05 or whatever.  I have occasionally
seen statistical papers which argued that the customary choice (for a 
particular test in a particular field) is wrong, for one reason or another, 
and a different choice should be made.  I have never followed the discussion 
to see if researchers in the field began using the new recommended level.

It is possible in principal to use a branch of decision theory to start from
statements like

	accepting a false positive will cost me $x
	rejecting a true positive will cost me $y
	the cost of each sample is $z
	the probability of the null hypothesis before testing is w%

and calculate the p-value at which you should reject the null hypothesis.  As
a bonus, you also calculate the sample size you need.  In practice, this is 
seldom done, because the $ numbers and a priori probability are usually
difficult to estimate, and if you are going to guess, why not just guess
at % significance?  Also, if you calculate a number which is within your 
social convention, then you have wasted your time.  If you calculate a
different number, then you either forget about it or plan to spend all your
time defending it.

However, this decision theory is usually behind discussions of what is the 
best level of significance for a given test in a given field.
1145.13Fun with statisticsVMSDEV::HALLYBFish have no concept of fireTue Nov 26 1991 19:0212
    Thanks for some very enlightening and helpful comments.
    
    Let me ask further about a comment Wally made:
    
>	the cost of each sample is $z
    
    If one is looking at historical data, this question is difficult to
    address.  Basically the cost is zero but the supply is limited.
    I presume this only affects the decision-theoretic methodology for
    arriving at the best p-value, not the validity of the test itself.
    
      John
1145.14Form of function.CADSYS::COOPERTopher CooperTue Nov 26 1991 19:4632
RE: .13 (John)

    I would say that you shouldn't get too hung up in the details of a
    decision theoretic justification.  A frequent criticism of decision
    theory as a model of actual decision making, or as a universally
    applicable method for reaching real decisions, is that it requires
    assignment of "hard" utility numbers to attempt to capture "soft"
    values.  Some Bayesian decision theory attempts to solve the problem by
    allowing the utility of a decision to have a distribution weighted
    by (possibly subjective) probability.  So instead of saying that
    "the cost of each sample is $z", one says, essentially that "the
    probability that each sample will be $z0 is p0, the probability that
    each sample will be $z1 is p1, etc."

    As a militant Bayesian, I'm sure Wally would rather not attempt to
    justify non-Bayesian decision theory (which has been shown to be
    inferior to Bayesian decision theory on very broad criteria).  His
    basic point, however, is accurate.  Justifications for deviations
    from customary alpha-levels take the *form* of decision theory (or
    cost/benefit analysis), whether or not they deal with the nitty
    gritty details sufficiently to be called decision theoretic
    justifications.

    Your particular point is easily dealt with, however.  For historic
    data, the cost of each of the first N samples is very low (perhaps, for
    convenience, $0), while the cost of each additional sample is very
    high (perhaps, for convenience, infinite).  Same applies to other
    situations which predetermine the sample size (e.g., there are only
    50 states, even if you have not yet collected the relevant information
    about each of them).

				    Topher
1145.15decisions and decision theoriesCSSE::NEILSENWally Neilsen-SteinhardtWed Nov 27 1991 14:1119
Topher gives the right answer to the question in .13: assume zero cost for the 
data you have and infinite cost for the data you cannot get.  This determines
test size so all you have left is to set the level of significance.

.14>    As a militant Bayesian, I'm sure Wally would rather not attempt to

I don't recognize myself in this description.  I personally prefer the Bayesian
interpretation of probability, because it fits the way I usually use it.
I don't think it is the only interpretation, or the best for everyone.

>    justify non-Bayesian decision theory (which has been shown to be

I know several decision theories, and a lot of less formal approaches to
making decisions.  A few are never the best choice, but most have some set
of decision problem for which they are the best approach.  The Bayesian
approach happens to be the best answer to the question "How could I rigorously
design a statistical test and set a level of significance, assuming I could
get all the relevant information?"  It is not the best answer to the question
"What level of significance should I use here?"
1145.16In defense of Bayesian Decision Theory.CADSYS::COOPERTopher CooperWed Nov 27 1991 18:4121
RE: .15 (Wally)

    Gee, I must have made you defensive -- never thought I'd see the day
    where I would be supporting Bayesian statistics when Wally wasn't.
    :-) (very much so).

    Given that prior estimates -- of utilities (costs/benefits), and
    liklihoods -- can be said to be at least approximately "rational"
    (basically, non-self- contradictory) Bayesian decision procedures are
    optimal over the long haul -- in the sense of making best use of
    whatever accuracy is in those prior rational guesses to arive at the
    most positive result.  This even applies to selecting the best
    significance criterion for non-Bayesian hypothesis-testing -- unless
    you wish to assume that the correlation between prior estimates of
    outcome and utility  and the actual probabilities of outcome and
    utility are negative.

    It may not, of course, be the most practical when you factor in the
    cost of the Bayesian computations.

					Topher
1145.17Different languages for different folksCORREO::BELDIN_RPull us together, not apartTue Dec 03 1991 11:2411
re              <<< Note 1145.11 by CADSYS::COOPER "Topher Cooper" >>>
                     -< Only one "real" chi-square test. >-

I'll agree to that terminology for all those who can handle the abstraction
to a linear model with constraints.  Unfortunately, the average consumer of
chi-squared statistics has not been taught to express his models that way,
but as separate models.  As one reads the textbooks produced for social
and behavioral scientists, the level of abstraction is much lower then the
General Linear Model.  

Dick