[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1145.0. "Chi-Square X^2 calculation (statistics)" by MILKWY::JANZEN (cf. ANT::CIRCUITS,ANT::UWAVES) Mon Oct 30 1989 18:36

    Hi.
    I don't understand my reference books for engineers, and I sold back my
    statistics book.
    How do you calculate Chi Squared (X^2) test of correlation, and how do you
    choose whether to use .01 or .05 to make a binary decision about
    correlation????  I have picked up that chi-square is a unitless test of
    correlation commonly performed in all branches of science to find
    correlation between factors, leading to studies of mechanisms 
    of causality in a system.  I guess it might be the quotient of the sums
    of the errors divided by the variance.
    I have two measurement systems.  I want to quantify how well their
    respective measurements mutually correlate.
    I make n measurements on each system of the same thing.
    One sequence is x(n), and the other from the other measurement system
    is y(n).
    
    How do I calculate the unitless correlation? 
    Thanks
    Tom (who will hit the books tonight and try again to decode them)

T.R	Title	User	Personal Name	Date	Lines
1145.1		AITG::DERAMO	like a candle in the wind	`Tue Oct 31 1989 02:03`	40
	I thought that the Chi Squared test was for "goodness of fit" as opposed to correlation. You suspect an experiment can have one of k results, with probabilities p[1] through p[k]. You perform N independent trials, and observe each of the k results Obs[i] times, 1 <= i <= k. The expected value of the number of times result i occurred is Exp[i] = N * p[i]. The Chi Squared statistic for this is the sum over 1 <= i <= k of (Obs[i] - Exp[i])^2 / Exp[i]. As N -> oo, the value computed for this statistic will approach a distribution known as Chi Squared with k-1 degrees of freedom (the distribution is that of the sum of k-1 independent normally distributed random values) if the assumptions were correct. If the true probabilities weren't the p[i], or the trials weren't independent, then the computed statistic tends to differ more and more from this as N increases. You can use this to show that two things are "correlated" by assuming they are independent and doing a Chi Squared test that "fails", i.e., is so out of range that you feel safe in concluding the two factors are not independent. The above described a one dimensional test. You can set up a two dimenionsal test as follows. One dimension is one of k1 classes with probabilities p[i]. The other is one of k2 classes with probabilities q[i]. Now "cell i,j" for 1 <= i <= k1, 1 <= j <= k2, has observed count Obs[i,j] and expected count N * p[i] * q[j] after N independent trials. Again compute the sum over all of the cells of (Obs - Exp)^2 / Exp. In this case, if the assumptions are correct, then as N -> oo the computed statistic has a Chi Squared distribution with (k1 - 1)*(k2 - 1) degrees of freedom. [One of the "assumptions" is that the class of the first factor is independent of the class of the second factor. If the test shows they are not independent then in some sense they are "correlated".] Dan
1145.2	I happen to have this problem ...	VMSDEV::HALLYB	The Smart Money was on Goliath	`Wed May 22 1991 16:39`	5
	OK, I have 80 "bins" that should uniformly share 281 samples. I calculate the chi-square value as 57.5 Tables only go to 30 or so, but it looks like I have 79 degrees O'freedom. What next?
1145.3		GUESS::DERAMO	Be excellent to each other.	`Wed May 22 1991 16:59`	34
	re .1 >> As N -> oo, the value computed for this statistic will >> approach a distribution known as Chi Squared with k-1 >> degrees of freedom (the distribution is that of the sum >> of k-1 independent normally distributed random values) if >> the assumptions were correct. Eeeek! Make that, the distribution is that of the sum of the squares of k-1 independent normally distributed random variables. One thing I left out of .1 is that it is suggested each bin have a minimum expected value of at least 5. If you don't have enough independent trials for that, then combine bins or get more independent trials. re .2 >> OK, I have 80 "bins" that should uniformly share 281 samples. Ummm, gee, you should combine bins or use more samples. :-) >> Tables only go to 30 or so, but it looks like I have 79 degrees O'freedom. I thought the tables usually gave a formula for computing the levels at higher degrees of freedom. If so, then plug in to the formula. If not, then a Chi squared with f degrees of freedom has mean f and variance 2f [variance, not standard deviation] and for large f can itself be approximated by a normal distribution with those parameters. I don't know if 79 is large enough. Dan
1145.4	formula in Knuth v2, 2ed.	TOOK::CBRADLEY	Chuck Bradley	`Thu May 23 1991 12:42`	7
	>> Tables only go to 30 or so, but it looks like I have 79 degrees O'freedom. i had a similar problem last year. the tables i consulted did not have a formula for large v. I finally found one in Knuth v2, in the section on testing random number generators. Beware, the formula in 1st ed. is wrong. Get the latest edition.
1145.5	Formulas.	CADSYS::COOPER	Topher Cooper	`Thu May 23 1991 18:21`	22
	The usual approximation given in tables of chi-square for df>30 is that sqrt(2X) - sqrt(2df - 1) is approximately normally distributed. This will give you about two decimal places of accuracy for df>30 and p<.995. A better approximation (especially if you are going to put it in code) is that: 1/3 (X/df) - (1 - 2/(9df)) --------------------------- sqrt(2/(9df)) is also approximately normally distributed. This will give you about 4 places of accuracy under the same conditions (yes, I looked it up). The straight approximation of (X-df)/df is not terribly useful. With 79 degrees of freedom, the .99 alpha point (i.e., the correct answer should be .99) comes out as .65 -- a tad off. Topher
1145.6	from some other tables	CSSE::NEILSEN	Wally Neilsen-Steinhardt	`Tue May 28 1991 15:17`	12
	.2> What next? Find some new tables. Winkler and Hays, oft cited by me, includes a table that goes up to 100, in steps of 10. For n=80, 57.2 corresponds to a fractile of 0.025. 60.4 " 0.05 As .3 says, you have too few samples to satisfy the usual rule of thumb of 5 predicted samples per bin.
1145.7	X^2 for uniform distribution problem	VMSDEV::HALLYB	Fish have no concept of fire	`Mon Nov 25 1991 16:40`	18
	Let me go back to one of the questions in .0 -- how do you "choose" whether to use .01 or .05? If the answer is "subjectively", might there be some sort of reference work of past case studies where tests are described and the authors say "We picked a confidence level of 95% because ..." ? Is it correct to say the choice must be made before the result is known? In some sense isn't it "cheating" to look up the confidence level before deciding what level to use, thereby allowing one to choose the largest successful level? If so, can we carry this one step further as follows: given 5 "bins" and an expectation of 80 per bin, I observe one bin comes in at 45. Without reference to the tables I'm already influenced by that datum and am tempted to use, say, 99.9% as my confidence level. Is that "cheating"? John
1145.8		COOKIE::PBERGH	Peter Bergh, DTN 523-3007	`Mon Nov 25 1991 18:23`	19
	<<< Note 1145.7 by VMSDEV::HALLYB "Fish have no concept of fire" >>> -< X^2 for uniform distribution problem >- >> Let me go back to one of the questions in .0 -- how do you "choose" >> whether to use .01 or .05? The confidence level is an estimate of the probability of being wrong if we reject the null hypothesis (i.e., in the chase of X^2, "the same distribution as ..."). Thus, you don't choose; your data "tell" you. >> Is it correct to say the choice must be made before the result is known? No; the data tell you what the confidence level is. When using the results of the test, however, you have to decide what risk of being wrong you're willing to accept. Thus, if some important decision hinges on the outcome of the test, you'll want a high level of confidence (i.e., a low probability of being wrong). What risk you're willing to accept is outside the realm of statistics.
1145.9	confindence and significance.	CADSYS::COOPER	Topher Cooper	`Mon Nov 25 1991 19:46`	63
	RE: .7, .8 There is some terminological confusion level here -- not surprising because in informal discussion terms which are technically quite distinct are used interchangeably. "Confidence" levels refer to "confidence intervals" which is part of parameter estimation rather than to hypothesis testing. When one performs a statistical test one calculates a "p-value" which is sometimes refered to as the "significance" of the test applied to the data. Informally one might even refer to it as the "significance level" of the test applied to the data. Some contemporary statisticians recommend that you stop there -- that the end point of the statistical procedure is a p-value, and what follows is interpretation. This is refered to as "signficance testing". There is some justice to this, but also a sort of indefiniteness to the results. The more traditional approach is "hypothesis testing". One selects in advance -- before even seeing the data -- a particular significance criterion, or alpha, or sometimes significance level. If the p-value is less than that criterion then the results are declared "significant" which means, more or less, "we can consider them as real rather than as a chance fluctuation". Frequently, a result will be reported with the criterion expressed explicitly, in a phrase similar to "the results are significant at the .05 level". Sometimes, in practice more than one criterion will be used, such as: p>.1 non-significant .1=>p>.05 suggestive .05=>p>.01 significant .01=>p highly significant Labels other than "significant" are sometimes intended only informally. E.g., a suggestive test is formally insignificant, but there is an indication to potential replicators that perhaps there is an effect which just requires a somewhat larger sample size. "Significant" and "highly significant" both mean "significant", but "highly significant" is less likely to turn out to just be a weird coincidence upon replication. The (formal) significance criterion is part of the test. In theory one should no more select it after one has performed the test or even seen the data than one should select any other aspect of the statistical test on the basis of the results. Anything else is indeed "cheating" (in some contexts the quotes are not needed). In practice, in the social sciences, the way the criterion is selected is: its .05 unless there is a perception of some risk -- real or intellectual -- from a false rejection of the null hypothesis, in which case the .01 level is used. For example, in parapsychology, until relatively recently, the standard criterion used was .01 and occasionally higher (i.e., smaller). It was realized, however, that since even the staunchest (knowledgable) critics accepted that the anomaly was statistically significant (the argument was and is whether the statistical anomaly was/is attributable to "conventional" causes) this was probably counterproductive. Useful data for resolving the questions about what is going on was being thrown out. Now the .05 level is generally used in parapsychological research. Topher
1145.10	other complications	PULPO::BELDIN_R	Pull us together, not apart	`Tue Nov 26 1991 14:25`	28
	re all Just to expand on Topher's comments, in particular in reference to the Chi-squared tests. I use the plural because the probability distribution which gives this kind of test its name can arise in many, many different kinds of experiments which are logically unrelated. One can use the Chi-squared distribution for many tests, just as you can use dice to play monopoly, backgammon, or to shoot craps. It is rarely clear from the title "Chi-squared test" what the writer means. Some applications are very gross approximations, others have some level of robustness, and others can be shown to follow the Chi-squared distribution exactly given some assumptions. As suggested by others, some statisticians demand that you have chosen the test procedure completely before collecting any data. This may include everything right down to the text of the summary with a zero-one variable used to decide which text you include and triggered by the (ultimate) result of your observations. In summary, there are no unanimous doctrines in statistics. We are all free to make our own mistakes. :-) Dick
1145.11	Only one "real" chi-square test.	CADSYS::COOPER	Topher Cooper	`Tue Nov 26 1991 15:19`	27
	RE: .10 (Dick) You have a point, but I don't entirely agree. There are a number of quite distinct things which may be referred to as "a chi-square test", because they make use of the chi-square distribution. For example, a couple of weeks ago I was comparing a single sample variance to an expected value using "a chi-square test". But there is only one chi-square test which is generally called that with qualifier or warning -- the chi-square frequency goodness-of-fit test. It is an extremely powerful, flexible, and in its general form, at times tricky to apply correctly, but it is essentially a single test. It is used when you have a model of a situation which makes a prediction about the frequency of independent events, and which you wish to compare to some observed frequencies. One common use of this is with models which assume the independence of two variables, and so the "chi-square test of independence" is sometimes treated as a separate test -- but it is, basically the same test. The trick to applying it in atypical situations is being sure that the frequencies are independent (i.e., that a chance fluctuation in one frequency cell does not produce a similar or opposite fluctuation in another except as accounted for in your model) and that you have the correct number of degrees of freedom (i.e., that you have properly encorporated into the test the degree to which the model forces the data to conform). Topher
1145.12	further discussion of choosing a level of significance	CSSE::NEILSEN	Wally Neilsen-Steinhardt	`Tue Nov 26 1991 15:48`	37
	I agree with Topher in .9 and disagree with Peter in .8. But there is a little more that can be said. .7> Let me go back to one of the questions in .0 -- how do you "choose" > whether to use .01 or .05? If the answer is "subjectively", might > there be some sort of reference work of past case studies where tests > are described and the authors say "We picked a confidence level of 95% > because ..." ? You would probably have to look pretty far to find this kind of statement. Mostly the level of confidence is chosen by a social convention: all the researchers in a field use 0.01 or 0.05 or whatever. I have occasionally seen statistical papers which argued that the customary choice (for a particular test in a particular field) is wrong, for one reason or another, and a different choice should be made. I have never followed the discussion to see if researchers in the field began using the new recommended level. It is possible in principal to use a branch of decision theory to start from statements like accepting a false positive will cost me $x rejecting a true positive will cost me $y the cost of each sample is $z the probability of the null hypothesis before testing is w% and calculate the p-value at which you should reject the null hypothesis. As a bonus, you also calculate the sample size you need. In practice, this is seldom done, because the $ numbers and a priori probability are usually difficult to estimate, and if you are going to guess, why not just guess at % significance? Also, if you calculate a number which is within your social convention, then you have wasted your time. If you calculate a different number, then you either forget about it or plan to spend all your time defending it. However, this decision theory is usually behind discussions of what is the best level of significance for a given test in a given field.
1145.13	Fun with statistics	VMSDEV::HALLYB	Fish have no concept of fire	`Tue Nov 26 1991 19:02`	12
	Thanks for some very enlightening and helpful comments. Let me ask further about a comment Wally made: > the cost of each sample is $z If one is looking at historical data, this question is difficult to address. Basically the cost is zero but the supply is limited. I presume this only affects the decision-theoretic methodology for arriving at the best p-value, not the validity of the test itself. John
1145.14	Form of function.	CADSYS::COOPER	Topher Cooper	`Tue Nov 26 1991 19:46`	32
	RE: .13 (John) I would say that you shouldn't get too hung up in the details of a decision theoretic justification. A frequent criticism of decision theory as a model of actual decision making, or as a universally applicable method for reaching real decisions, is that it requires assignment of "hard" utility numbers to attempt to capture "soft" values. Some Bayesian decision theory attempts to solve the problem by allowing the utility of a decision to have a distribution weighted by (possibly subjective) probability. So instead of saying that "the cost of each sample is $z", one says, essentially that "the probability that each sample will be $z0 is p0, the probability that each sample will be $z1 is p1, etc." As a militant Bayesian, I'm sure Wally would rather not attempt to justify non-Bayesian decision theory (which has been shown to be inferior to Bayesian decision theory on very broad criteria). His basic point, however, is accurate. Justifications for deviations from customary alpha-levels take the form of decision theory (or cost/benefit analysis), whether or not they deal with the nitty gritty details sufficiently to be called decision theoretic justifications. Your particular point is easily dealt with, however. For historic data, the cost of each of the first N samples is very low (perhaps, for convenience, $0), while the cost of each additional sample is very high (perhaps, for convenience, infinite). Same applies to other situations which predetermine the sample size (e.g., there are only 50 states, even if you have not yet collected the relevant information about each of them). Topher
1145.15	decisions and decision theories	CSSE::NEILSEN	Wally Neilsen-Steinhardt	`Wed Nov 27 1991 14:11`	19
	Topher gives the right answer to the question in .13: assume zero cost for the data you have and infinite cost for the data you cannot get. This determines test size so all you have left is to set the level of significance. .14> As a militant Bayesian, I'm sure Wally would rather not attempt to I don't recognize myself in this description. I personally prefer the Bayesian interpretation of probability, because it fits the way I usually use it. I don't think it is the only interpretation, or the best for everyone. > justify non-Bayesian decision theory (which has been shown to be I know several decision theories, and a lot of less formal approaches to making decisions. A few are never the best choice, but most have some set of decision problem for which they are the best approach. The Bayesian approach happens to be the best answer to the question "How could I rigorously design a statistical test and set a level of significance, assuming I could get all the relevant information?" It is not the best answer to the question "What level of significance should I use here?"
1145.16	In defense of Bayesian Decision Theory.	CADSYS::COOPER	Topher Cooper	`Wed Nov 27 1991 18:41`	21
	RE: .15 (Wally) Gee, I must have made you defensive -- never thought I'd see the day where I would be supporting Bayesian statistics when Wally wasn't. :-) (very much so). Given that prior estimates -- of utilities (costs/benefits), and liklihoods -- can be said to be at least approximately "rational" (basically, non-self- contradictory) Bayesian decision procedures are optimal over the long haul -- in the sense of making best use of whatever accuracy is in those prior rational guesses to arive at the most positive result. This even applies to selecting the best significance criterion for non-Bayesian hypothesis-testing -- unless you wish to assume that the correlation between prior estimates of outcome and utility and the actual probabilities of outcome and utility are negative. It may not, of course, be the most practical when you factor in the cost of the Bayesian computations. Topher
1145.17	Different languages for different folks	CORREO::BELDIN_R	Pull us together, not apart	`Tue Dec 03 1991 11:24`	11
	re <<< Note 1145.11 by CADSYS::COOPER "Topher Cooper" >>> -< Only one "real" chi-square test. >- I'll agree to that terminology for all those who can handle the abstraction to a linear model with constraints. Unfortunately, the average consumer of chi-squared statistics has not been taught to express his models that way, but as separate models. As one reads the textbooks produced for social and behavioral scientists, the level of abstraction is much lower then the General Linear Model. Dick