T.R | Title | User | Personal Name | Date | Lines |
---|
935.1 | Doubt it. | PBSVAX::COOPER | Topher Cooper | Fri Sep 23 1988 15:57 | 42 |
| I'll have to give it more thought, but I would say that it is
rather unlikely. Roughly speaking your claim of 30 degrees of
freedom boils down to the following claim:
Given the contents of 30 selected bins and the total number
of samples, the counts in the other six bins may be predicted
with 100% accuracy.
This is a little simplistic: if you could show that the proper
amount of partial information about all the bins allows you to
predict the exact frequencies of all the bins, for example, you
would have proven your point as well.
When you say that a chi-square test has 30 degrees of freedom then
you are saying that there exists 30 variables which determine,
in conjunction with some number of parameters (two in this case,
the number of cells and the number of samples) the complete state
of the problem.
You have violated one of the fundamental rules of the chi-square
test (one of the few, its a rather non-demanding test; frequently
classed with non-parametric tests though it is not actually one).
Specifically that the counts in the different cells must be
independent. Some interaction means that in some abstract sense
the number of degrees-of-freedom are reduced, but there is no
reason to expect it to be by an integral number nor (as far as
I know) that the result will still be a simple chi-square if it
happens to be an integral number, nor do I know of any way to
calculate the change in the degrees of freedom in any particular
case.
My intuition (and intuition is notoriously bad in such cases) is
that the actual distribution would be closer to chi-square with
35 degrees of freedom than to 30. It shouldn't take long to
simulate a few hundred drawings a few hundred times and take
a look at the resulting distribution.
Are you trying to design a test to check to see if the Mass Lottery
drawing is unbiased? I can think about how to check that if you
would like.
Topher
|
935.2 | | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Fri Sep 23 1988 21:57 | 47 |
| Someone in TIXEL::LOTTERIES thought that the distribution of
how many 1's, 2's, ..., 36's that have come up in the Mass.
Megabucks was too non-uniform. So I thought to myself, if
the numbers had been drawn one at a time, we would have a
chi squared with 35 degrees of freedom, and so the mean of Q
would be 35 and its variance would be 70. The measured Q I
think was around 45 which would be okay. So I said that I
didn't think the distribution was too non-uniform, and
started thinking about how to test that.
At first, without thinking about it, I thought that since
the assumptions of a true chi squared test were violated
that the mean would be higher. So I worked out the expected
value of Q symbolically, and the result was 30 (independent
of N). My first reaction was that this was just like a chi
squared with 30 degrees of freedom. My second reaction
reaction was why wasn't it higher, and I convinced myself
that in six separate drawings, you could have duplicates,
which makes things more non-uniform, but six at a time means
fewer duplicates, and so a smaller value for Q.
Then I ran some simulations, and got a value for the mean
that was both higher than 30 and depended on N. So I redid
my theoretical analysis, and again got a mean of 30 for Q.
Again I ran some random simulations and got the "wrong"
results. Again I did the theoretical analysis and got 30.
So I ran the simulations again and [finally] got results
that had a closer fit to the model, although the variance was
too small for the smaller value of N (I used N = 600 and N =
1200 and simulated 100 values of Q for each in this last
batch of tests). I put off determining the expected value
of the variance of Q because it was too messy symbolically,
and decided to ask here about it first.
I can't scrutinize the tests now, because I typed in the
LISP code interactively the first two times and so don't
save copies of it. When things didn't look right I just
used (EXIT) and rechecked my analysis. After the analysis
looked right the third time I started wondering about the
random number generator. After all, it couldn't possibly
have been the first two programs! :-)
If we can't figure out the distribution, then what about the
"cutoff" values for declaring the observed value of Q to be
too low or too high at, say, the 90% significance level?
Dan
|
935.3 | Hmm? | PBSVAX::COOPER | Topher Cooper | Mon Sep 26 1988 14:24 | 20 |
| Dan,
I also ran a quick simulation and got a value higher than 30. I'm
going to run a more careful one and will report back to you.
1) How did you calculate an expected value of 30? I would expect
a reduced mean, as I said, but that seems to extreme.
2) I think your best bet here is to use distributional sampling.
Simulate the situation a large number of times (let it run over
night) and simply tally how many simulated runs result in a Q
less than or equal to the actual value. That value divided by
the total number of simulations is your "p" value. With enough
simulated runs this method is accurate and makes virtually no
assumptions about the actual data. By use of a binomial distribution
(or, in this case, a normal approximation to the binomial distribution)
you can set confidence limits on your p value and otherwise be
more precise about exactly what you are saying.
Topher
|
935.4 | E[Q] = 30 | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Tue Sep 27 1988 12:34 | 30 |
| Let f be how many times a 1 comes up in N drawings.
Let gi, 1 <= i <= N be 1 or 0 depending on whether there
was a 1 in the i-th drawing. Then f = g1 + g2 + ... + gN.
It is easy enough to verify that each gi is 1 with
probability 1/6 and 0 with probability 5/6. This gives
for the expected values of gi, (gi)^2, and (gi)(gj) for
i /= j
E[gi] = 1/6
E[(gi)^2] = 1/6
E[(gi)(gj)] = 1/36 i not equal to j
Given that f is g1 + ... + gN and that f^2 is the sum
of N terms (gi)^2 and N^2 - N terms (gi)(gj) with i /= j,
it follows that
E[f] = sum of N E[gi] = 36 E[gi] = N/6
E[f^2] = sum of N E[(gi)^2] + sum of N^2 - N E[(gi)(gj)]
= N/6 + (N^2 - N)/36
= (N^2 + 5N) / 36
Now, Q = sum of 36 ((f - N/6)^2)/(N/6), so
E[Q] = 36 (6/N) (E[f^2] - 2 (N/6) E[f] + (N/6)^2)
= 36 (6/N) ((N^2 + 5N)/36 - 2 (N/6)^2 + (N/6)^2)
= (6/N) 36 ((N^2 + 5n)/36 - (N^2 / 36))
= (6/N) (N^2 + 5N - N^2)
= 30
Dan
|
935.5 | | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Tue Sep 27 1988 12:36 | 3 |
| Why do you feel that a reduction from 35 to 30 is extreme?
Dan
|
935.6 | Why extreme. | PBSVAX::COOPER | Topher Cooper | Tue Sep 27 1988 14:58 | 54 |
| RE: .4
Just looked it over quickly but it seems OK.
RE: .5
Basically a matter of intuition. Remember that we are talking about
two different quantities 1) The mean and 2) The number of degrees
of freedom. These two values are equal if the process produces
a random variate with a chi-square distribution, but not otherwise.
As I said, if the process has D degrees of freedom, this means that
I can look at the 36 variables (bin counts) describing the results
plus the problem paramenters (N and the number of bins, b) and extract
from them 30 variable values, which, with N and b would allow me
to reconstruct the original 36 variables.
Without doing a detailed analysis it seemed to me that the additional
constraints imposed by the throwing out duplicates might allow
me to get away with one variable or a bit more, but not five.
So assuming that the mean is 30, either:
1) My intuition is wrong and we can find 30 numbers which will
allow us to deduce all 36, or
2) The distribution is chi-square with parameter (generally
called degrees-of-freedom) 30, but that parameter is not
related to the degrees-of-freedom of the underlying process
under these conditions, or
3) The distribution is not chi-square -- violation of the
test assumptions leading to a completely different distribution.
Alternative 3 seems the most likely to me, followed by the possibility
that the mean is *not* thirty.
By the way, I did two different simulations using two different
RNGs and two different methods of using the RNG values to select
six distinct numbers from the 36. One simulation agreed with
my intuition -- a mean around 34 -- and strongly rejected a mean
of 30 (15 standard deviations). The other was consistent with
with a mean of 30 (1.2 standard deviations, I believe). Some swapping
of code revealed that the selection algorithm is at fault, but I
don't know which is right -- I haven't found anything obvious in
reviewing the code or tracing with the debugger. I'll keep working
on it. I'm going to try a third selection algorithm to help focus
my attention.
This, by the way, says something important about doing simulations.
If I had just gone with the first result (consistent with my intuition)
I may have made a serious error (or maybe not, we'll see).
Topher
|
935.7 | "lucky numbers?" | CTCADM::ROTH | lick bush in '88 | Fri Sep 30 1988 09:31 | 12 |
| Has anyone analyzed the distributions of the winning numbers?
One can imagine how superstitious anyone would be to actually play a
lottery game and have any 'expectation' of gaining anything.
I assume the lottery is some sort of parimutual system (I think this
is the correct term) where part of the take is divided among the winners.
It would be interesting if numbers with low take per person showed a
pattern; one would then avoid betting on those...
- Jim
|
935.8 | On slipping into the trap | AKQJ10::YARBROUGH | I prefer Pi | Fri Sep 30 1988 12:20 | 34 |
| > Has anyone analyzed the distributions of the winning numbers?
Probably, but why bother? Examining lists of truly random numbers is
second only to watching nails rust.
> One can imagine how superstitious anyone would be to actually play a
> lottery game and have any 'expectation' of gaining anything.
People play for the excitement until they either get (1) bored or (2) so
addicted that they end up in Gamblers Anonymous, where they get one last
chance to get their lives back on track - usually after having lost all
their money, friends, jobs, family, self-worth, etc...
> I assume the lottery is some sort of parimutual system (I think this
> is the correct term) where part of the take is divided among the winners.
It's spelled parimutuel (I have no idea why) and yes, half the money goes
into prizes. That means your mathematical expectation is -$.50 for each
dollar spent, *even if you win*. You can get better odds by going to a Vegas
casino and throwing all your money on the floor: if you're quick you can
get more than half of it back in your pockets before the crowd tramples you
into the carpets.
> It would be interesting if numbers with low take per person showed a
> pattern; one would then avoid betting on those...
If any of the numbers showed any kind of pattern at all the lottery would
be out of business in a few weeks. There are 1,947,792 possible draws, of
which about 1,947,600 have never won anything. I advise not betting on any
of them.
DO NOT assume that since the odds are about 2,000,000-1 that when the pot
gets over $2,000,000 that it's then worth risking a few dollars. That
simply increases the number of bettors and the number of tickets sold,
which increases the number of multiple winners that share the pot. In all
of this, your expectation remains at a rock-solid -$.50 per dollar spent.
Lynn Yarbrough
|
935.9 | Shhhh! Don't say that in TIXEL::LOTTERIES! | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Fri Sep 30 1988 13:04 | 3 |
| Interesting comments on gambling from node AKQJ10. :-)
Dan
|
935.10 | Not as bleak as all that. | PBSVAX::COOPER | Topher Cooper | Fri Sep 30 1988 13:33 | 92 |
935.11 | | BEING::POSTPISCHIL | Always mount a scratch monkey. | Fri Sep 30 1988 16:11 | 46 |
| Re .8:
> Probably, but why bother? Examining lists of truly random numbers is
> second only to watching nails rust.
Perhaps, but lottery numbers are selected by using physical objects
rather than selecting from a truly uniform distribution.
> People play for the excitement until they either get (1) bored or (2)
> so addicted that they end up in Gamblers Anonymous, . . .
You forgot "(3) win".
> That means your mathematical expectation is -$.50 for each dollar
> spent, *even if you win*.
That's only the dollar expectation. It does not reflect the utility of
playing. One dollar for a one-in-a-million chance at half a million is
not necessarily a bad deal -- if you have nothing else more useful to
do with the one dollar. It depends on your situation. Even the
entertainment of playing might be worth 50 cents.
> If any of the numbers showed any kind of pattern at all the lottery
> would be out of business in a few weeks.
That is false, since the numbers show a pattern and the lottery is not
out of business. The most often picked numbers are arithmetic
sequences, such as 1-8-15-22-29-36, 1-6-11-16-21-26, and even
1-2-3-4-5-6. After that, people start using dates. There's a note
somewhere in the lotteries conference with the most frequently picked
sets and the number of picks of each for a single Massachusetts
lottery.
> DO NOT assume that since the odds are about 2,000,000-1 that when the
> pot gets over $2,000,000 that it's then worth risking a few dollars.
> That simply increases the number of bettors and the number of tickets
> sold, which increases the number of multiple winners that share the
> pot.
Those other bettors are kindly crowding themselves into the
above-described selections. If one picks randomly or, better yet,
picks randomly with bias against common selections, one is likely not
to share.
-- edp
|
935.12 | | MECAD::ROTH | lick bush in '88 | Fri Sep 30 1988 17:14 | 11 |
| Re .8 quite the sermon there... (I should have put my smiley face on)
I didn't even know there was a lottery conference, but as mentioned
above human nature abhorrs randomness and it would be amusing to see
what effect this would have on game.
"With the gambler resides the last vestige of codified superstition"
- R. Epstein
- Jim
|
935.13 | | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Fri Sep 30 1988 21:30 | 22 |
| I haven't verified this, but I once read or heard that
someone studied the payoffs in the daily game (4 digits).
Apparently the payoffs are better for the 3-digit pick
than if you pick all 4 digits (i.e., not proportional
to the probabilities). It said that soon after the game
started the percentage of tickets playing only three
numbers grew because of this.
Anyway, the result was something like if you play nines
and threes and you win, it would almost be worthwhile
because the payoff will be split with so few others.
The above was for Massachusetts.
Re the comment earlier about taxes. If your one dollar
bet wins a dollar, there is no tax on it. If it wins
$100, you only pay taxes on $99. If it wins $5,000,000
over twenty years, I don't know if you subtract the one
dollar from the first year, or five cents each year. :-)
Call your local IRS office.
Dan
|
935.14 | Ambling back towards the main topic, | POOL::HALLYB | The smart money was on Goliath | Mon Oct 03 1988 16:34 | 5 |
| Suppose you repeatedly drew 35 balls from the urn containing 36, and
looked at the distribution of how frequently each number came up.
Wouldn't this be chi-squared(1)? Is it just coincidence that 36-35=1?
John
|
935.15 | | PBSVAX::COOPER | Topher Cooper | Tue Oct 04 1988 13:54 | 8 |
| I don't know. There is clearly a symmetry here which says that
the distribution for drawings of i at a time equals 36-i whatever
the distribution is.
I'd like to get back to this, but I'm a little busy now, so I
don't know when I'll get to it -- soon, I hope.
Topher
|
935.16 | re: draw 35 at once | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Tue Oct 04 1988 22:44 | 38 |
| Redo the analysis in .4 for the case of 35 balls being
drawn at each turn:
E[gi] = 35/36
E[(gi)^2] = 35/36
E[(gi)(gj)] = (35/36)^2 i not equal to j
E[f] = sum of N E[gi] = ...
Oops. In .4 that should say
>> E[f] = sum of N E[gi] = N (1/6) = N/6
instead of what I had (I wrote 36 for N, then ignored
it to get the correct result N/6)
>> E[f] = sum of N E[gi] = 36 E[gi] = N/6
But here, this works out to
E[f] = sum of N E[gi] = N (35/36) = 35N/36
E[f^2] = sum of N E[(gi)^2] + sum of N^2 - N E[(gi)(gj)]
= N (35/36) + (N^2 - N)(35/36)^2
= (1/36)^2 (36 * 35 N + (N^2 - N) * 35 * 35)
= (1/36)^2 (1260N + 1225N^2 - 1225N)
= (1225N^2 + 35N)/1296
Q = sum of 36 ((f - E[f])^2)/E[f], so
E[Q] = 36 (E[F^2] - E[f]^2)/E[f]
= 36 (36/35N) ( (1225N^2 + 35N)/1296 - (35N/36)^2 )
= (1/35N)( 1225N^2 + 35N - 1225N^2 )
= 1
= 36 - 35
:-)
Dan
|
935.17 | conjecture is false | CTCADM::ROTH | Lick Bush in '88 | Thu Oct 06 1988 20:58 | 62 |
| Suppose you consider a choice of one of the 36 numbers as taking a
step along a unit vector in 36 dimensional space. Then adding up
many random choices amounts to looking at a resulting 36 dimensional
vector.
Since all choices are distributed among the 36 coordinates, the
possible vectors for a given number of trials lie in a hyperplane.
Subtracting the expectation from each coordinate translates the
hyperplane to the origin, and shows why if the choices are independant
there are 35 degrees of freedom, and not 36. The vectors will lie
in a symmetrical 35 dimensional simplex in the hyperplane.
By the central limit theorem the marginal densities of each coordinate
will be close to gaussian.
Now suppose we choose 6 different numbers; each of the C(36,6)
possibilities are equally likely. We can make many trials of 6
numbers each, and tally up the hits in a C(36,6) dimensional space,
and again the densities will be gaussian in this high dimensional
space.
But if we project this space of 6-fold exterior products down to the
base space with a linear transformation the gaussian nature of the
distribution will not change, since a linear transformation of a
multivariate gaussian distribution is still gaussian.
It is enough to calculate the rank of a a projection from a k-fold
exterior product down to the base space, since this what further
reduces the degrees of freedom of the chi-squared statistic.
Using this reasoning, the conjecture in the base note is not true
in general. Consider the simple example of 5 numbers chosen 3 at
a time. We have a C(5,3) = 10 dimensional set of combinations, and
these project to the 5 dimensional space with the matrix:
| 123 |
| 124 |
| 1 | | 1 1 1 1 1 1 0 0 0 1 | | 125 |
| 2 | | 1 1 1 0 0 0 1 1 1 0 | | 134 |
| 3 | = | 1 0 0 1 1 0 1 1 0 1 | * | 135 |
| 4 | | 0 1 0 1 0 1 1 0 1 1 | | 145 |
| 5 | | 0 0 1 0 1 1 0 1 1 1 | | 234 |
| 235 |
| 245 |
| 345 |
But this matrix has rank 5, and so the degrees of freedom are not
reduced as claimed. Easier still, consider the 3 dimensional case
choosing pairs of numbers - the pairs (12, 13, 23) are even
isomorphic to the base space then!
This is not to say the expected value of the chi-square statistic
will not be reduced.
I'll have to do a bit of combinatorial thinking on the general case,
but I'm not much good at that kind of stuff and someone else may see
an easy way to get the general result we want. I'm pretty sure
that the degrees of freedom can only be reduced if there are fewer
combinations of numbers than dimension of the base space, which never
happens.
- Jim
|
935.18 | | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Thu Oct 06 1988 22:07 | 8 |
| Would anyone like to grind out E[Q^2] and show that the
result that it gives for the variance of Q (which would
be E[Q^2] - (E[Q])^2) is not the same as for a chi squared
distribution?
Or even a large simulation that shows the same.
Dan
|
935.19 | missing lemma | CTCADM::ROTH | Lick Bush in '88 | Fri Oct 07 1988 11:44 | 23 |
| I was on my way out last nite and was too dull and lazy to show that the
projection matrices from C(n,k) space to the base space are of rank n.
It seemed that they would be.
Look at the part of the matrix that transforms combinations
that are cyclic shifts thru the numbers; the colums are shifts of
each other. These shifted colums will be linearly independant
since each is a transformation of the first one by a power of a shift
operator. This shift operator satisfies S^n = I, and so its
eigenvalues are n-th roots of unity; thus there exists no lower
order polynomial which divides its characteristic polynomial, so
that the cyclic shifts of the first column are indeed independant.
They span n-space and the projection has rank n for all k .ne. n.
Re .-18 - I take it you don't believe me. Want to make a wager? :-)
Actually I attempted to use this geometric reasoning to prove the
conjecture since I was almost sure it was true! But difficulties arose
in some simple examples and it dawned that it must actually be false
instead...
- Jim
|
935.20 | experimental run this morning | CTCADM::ROTH | Lick Bush in '88 | Fri Oct 07 1988 12:59 | 80 |
| Herewith results of an experiment. 5000 sets of 400 drawings of
29 unique numbers from a set of 36, with a histogram of the chi^2
statistic. Also is shown a 7 degree of freedom density, with the chi
axis scaled to match the expectations for 35 and 7 degrees of freedom.
- Jim
ndraws = 29
ntrials = 400
npasses = 5000
drawing without replacement
average chi_sq = 6.998502482759
expected chi_sq = 7.000000000000
normalized average chi_sq = 34.992512413793
normalized variance = 68.906763417360
low tail = 0.000035
high tail = 0.000230
chi^2 hits theo 35 deg theo 7 deg obs/theory
------ ----- ----------- ---------- ----------
11 1 0.328105 67.547145 3.047804
12 3 0.781862 75.282177 3.836992
13 2 1.679018 82.568367 1.191172
14 4 3.297149 89.323645 1.213169
15 8 5.990812 95.487441 1.335378
16 6 10.168625 101.018179 0.590050
17 15 16.252115 105.890986 0.922957
18 25 24.620926 110.095192 1.015396
19 45 35.552756 113.632225 1.265725
20 40 49.168096 116.513360 0.813536
21 69 65.389781 118.758202 1.055211
22 72 83.924200 120.392719 0.857917
23 98 104.268129 121.447906 0.939885
24 125 125.740105 121.958580 0.994114
25 155 147.531359 121.961758 1.050624
26 173 168.770200 121.496590 1.025062
27 190 188.590765 120.602733 1.007472
28 202 206.197300 119.320177 0.979644
29 223 220.919447 117.688002 1.009418
30 209 232.252235 115.745013 0.899884
31 250 239.877406 113.528377 1.042199
32 260 243.670007 111.073793 1.067017
33 243 243.688849 108.415422 0.997173
34 235 240.152372 105.584723 0.978545
35 231 233.410081 102.612186 0.989674
36 220 223.909233 99.525589 0.982541
37 195 212.149024 96.350968 0.919165
38 224 198.663438 93.111513 1.127535
39 189 183.975913 89.829390 1.027308
40 163 168.582132 86.524135 0.966888
41 146 152.930183 83.213499 0.954684
42 146 137.408638 79.913668 1.062524
43 126 122.339649 76.638057 1.029920
44 111 107.977410 73.400938 1.027993
45 106 94.510374 70.209704 1.121570
46 75 82.066118 67.077130 0.913897
47 75 70.718569 64.010241 1.060542
48 59 60.495938 61.015903 0.975272
49 50 51.389184 58.099831 0.972967
50 48 43.360256 55.266684 1.107005
51 35 36.349719 52.520160 0.962869
52 25 30.283608 49.863087 0.825529
53 27 25.079182 47.297504 1.076590
54 20 20.649741 44.824744 0.968535
55 19 16.908393 42.445509 1.123702
56 9 13.770913 40.159942 0.653551
57 11 11.157770 37.967690 0.985860
58 6 8.995470 35.867980 0.667002
59 5 7.217311 33.859613 0.692779
60 4 5.763708 31.941143 0.693998
61 6 4.582156 30.110793 1.309427
62 4 3.626956 28.366567 1.102853
63 4 2.858772 26.706277 1.399202
64 3 2.244081 25.127574 1.336850
66 2 1.366593 22.204947 1.463493
67 1 1.060432 20.855816 0.943012
68 1 0.819885 19.577907 1.219684
71 1 0.371166 16.144364 2.694213
|
935.21 | Unproven. | ERLTC::COOPER | Topher Cooper | Fri Oct 07 1988 19:26 | 28 |
| RE: .17
I don't think you have disproven the conjecture, although you have
confirmed my intuition.
We have to distinguish two concepts. One is the number of degrees
of freedom of the underlying process. The other is the parameter
to the chi-square family of distributions, which is refered to as
the number of degrees of freedom since that is its source in
conventional uses of the distribution.
You have demonstrated that the degrees of freedom for the underlying
process are not, in general, N-k (where N is the number of available
values, and k is the number selected). This does not prove that
the distribution of the chi-square statistic under these conditions
isn't the chi-square distribution with parameter N-k, which is the
actual conjecture.
A simpler demonstration that the number of degrees for the underlying
process is not in general N-k, is provided by the example where
k = N-1, i.e., where each trial consists of selecting all but one
of the numbers. This is obviously equivalent to selecting one
number at each trial. The number of degrees of freedom in the
two cases must therefore be the same. But we know that the
number of degrees of freedom selecting one number at a time is
N-1, which is not generally equal to 1 = N-k.
Topher
|
935.22 | Disproven | ERLTC::COOPER | Topher Cooper | Fri Oct 07 1988 19:55 | 47 |
935.23 | .18 not a "no" - it's a "huh?" | LISP::DERAMO | Daniel V. {AITG,LISP,ZFC}:: D'Eramo | Sat Oct 08 1988 00:26 | 25 |
| re .-1,
A good analysis! I thought of doing something similar
to compute E[Q] for selecting 35 out of 36 balls, but
decided to just compute it directly instead.
The chi square conjecture seemed to agree with empirical
results for the mean but not for the variance; the formula
at the end of .-1 has the same mean but a different
variance. We should see if it agrees with the empirical
results.
>> .19 Re .-18 - I take it you don't believe me. Want to make a wager? :-)
I thought I had said earlier that one of my reactions was
that "it can't be that easy!" in my "history" reply .2, but
re-reading it shows that I didn't. Oh well.
I posted .18 because I didn't completely understand your
.17. :-) I haven't figured out .20, either; what does your
"normalized" mean? Is it the same as in .22? Whereas a
one-in-a-million probability of an observed variance given
the conjecture in .0 is very easy to understand.
Dan
|
935.24 | clarification | CTCADM::ROTH | Lick Bush in '88 | Mon Oct 10 1988 12:27 | 31 |
| I'll stand by my reasoning, as it goes back to first principles.
You should return to the actual definition of the chi-square
distribution: the probability density of the squared euclidean length
of a vector of n gaussian variates with equal mean and variance. This
is how Pearson origionally derived the distribution, though I've never
seen that paper.
The definition makes essential use of linear vector spaces equipped
with a euclidean metric, so it is correct to think about the problem
in this way. The part that was glossed over - the rank of the
transformation from C(n,k) space to n space - can be shown many ways;
for example the matrix can be thought of as an incidence matrix of a
graph, or as a markhoff matrix (by scaling the entries by 1/k), or
you can use invariant subspace reasoning, but the result is the
same - the rank (number of linearly independant rows) is n.
Re - the little simulation run earlier. The claim is that the
chi-square statistic for hit counts will exhibit a chi-square
distribution with n-1 degrees of freedom and an expected value of
n-k. The program repeatedly chose 29 out of 36 numbers and tallied
the hit counts in 36 bins. It then took a histogram of the chi-square
statistic. To compare only the shapes of the statistic and the
7 and 35 degree of freedom distributions, I linearly scaled the chi-square
axes of each of them to have the same expectation, that's all.
Look at a low dimensional case, like 2 out of 3 numbers, or 3 out of 5.
This was how I arrived at the conclusion; the program was only a double
check.
- Jim
|
935.25 | Just a nit. | ERLTC::COOPER | Topher Cooper | Mon Oct 10 1988 12:50 | 32 |
| RE: .24
I have seen a number of "actual definitions" of the chi-square.
Although useful for later analysis the vector language seems completely
redundant for a basic definition. Essentially the same definition
in more elementary language is:
The chi-square distribution with n degrees of freedom is the
distribution resulting from summing the squares of n normal
distributions.
or more technically correct:
The chi-square distribution with n degrees of freedom is the
distribution of the random variables whose value is equal to
the sums of the squares of n random variables independently
distributed according to the standard normal distribution.
Introducing vector language essentially results in us taking the
sum of the squares and finding the square root (length of vector)
then squaring it out again.
I'm not arguing with your definition as useful -- even the most
useful -- definition for this purpose, and perhaps it was the
way that chi-square was first defined (I have no idea), but to
say that it is "the" (only real) definition goes a bit to far.
Axiomatizations of mathematics (which includes, of course,
definitions of non-primitives) are largely a matter of taste, and
there are always alternatives in any active field.
Topher
|