[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1042.0. "Colorado Lotto" by DEC25::ROBERTS (Reason, Purpose, Self-esteem) Mon Mar 20 1989 21:44

    Greetings! It's been a long time since I participated in this
    conference.
    
    I'm studying the Colorado state "lotto" game. In it, one attempts
    to guess the six numbers out of forty-two that will be drawn.
    
    In the last game,
    
         0 people guessed all 6.
        20 people guessed 5 of the 6.
     1,258 people guessed 4 of the 6.
    20,546 people guessed 3 of the 6.
    ======
    21,824 people guessed at least 3 of the 6.
    
    The probability of guessing at least 3 of the 6 is 152467/5245786
    or approximately 0.0290646625692.
    
    How many people played?
    
    One answer: since 21,824 people guessed at least 3 of the 6 and
    the probability of this is 0.0290646625692, then the number who
    played is about  21824/0.0290646625692 or 750,877.459803.
    
    Another answer: doing least squares for each level of winner; i.e.,
    choosing X in order to minimize
    
    (142800/5245786*X-20546)^2 + 
    (  9450/5245786*X- 1258)^2 +
    (   216/5245786*X-   20)^2 +
    (     1/5245786*X-    0)^2
    
    yields X approximately  754,514.6635.
    
    Which answer is better and why? Is there a better estimate, and
    if so how is it reached?
    
    					/Dwayne
    
T.RTitleUserPersonal
Name
DateLines
1042.1Use Bayes' TheoremNIZIAK::YARBROUGHI PREFER PITue Mar 21 1989 12:286
I believe the best approach to this problem is to apply Bayes' Theorem,
which I think has been discussed elsewhere in the conference, and is 
certainly discussed in any probability text. Your first calculation is, I 
think, not far wrong, if overly precise (.000001 persons???).

Lynn Yarbrough 
1042.2Doesn't LSQ imply equal error bands?POOL::HALLYBThe Smart Money was on GoliathTue Mar 21 1989 14:2018
    Probably you mean how many TICKETS were purchased, not how many people
    participated.  (Unless Colorado has unusually restrictive laws...)
    
    I think any time you use the higher-paying results you are going to
    introduce more error than you correct.
    
.0>    One answer: since 21,824 people guessed at least 3 of the 6 and
.0>    the probability of this is 0.0290646625692, then the number who
.0>    played is about  21824/0.0290646625692 or 750,877.459803.
    
    Suppose instead we look at tickets where all 6 were guessed correctly.
    There were about 0/.00000019 or 0 tickets played, using the logic
    above from .0  Obviously more than 0 tickets (21,824 were winners)
    were sold.  So by including 6-out-of-6 you are probably introducing
    error, not correcting it.  Similar logic applies to 5-out-of-6 and
    4-out-of-6, though to lesser extents.
    
      John
1042.3DEC25::ROBERTSReason, Purpose, Self-esteemTue Mar 21 1989 14:218
    Thanks, Lynn. I appreciate the reference to Bayes' Theorem.
    
    I did a DIR/TITLE=BAYE to see if I could locate the note you refer to,
    but drew a blank. Could someone point me to the proper note to read or
    apply Bayes' Theorem to the problem in 1042.0 with an explanation? 
    
    			/Dwayne
    
1042.4DEC25::ROBERTSReason, Purpose, Self-esteemTue Mar 21 1989 15:2126
    RE: .2 by John
    
    I don't want to get into a discussion here about how restrictive
    Colorado's laws are. There's a place for everything and MATH isn't for
    politics. But you're right, of course, about it being tickets sold
    rather than people participating. Believe it or not, it was actually
    reported in the paper as people playing. I guess it makes it sound like
    more of a popular game than it really is. 
    
    I appreciate your argument about trying to predict based solely on the
    information of the number of 6-out-of-6 winners. In general, the more
    information available, the less the error. This is why one could argue
    that the Least Squares Method is more accurate than dividing total
    winners by the probability of the winners. It doesn't lump 8 pieces of
    information into 2 sums.
    
    But Least Squares seems to introduce an arbitrary manipulation into the
    approximation. I.e., why squares? Why not power 1.9? Or power 3.14? Why
    is the power constant for each term? Maybe it should be power 1.0 at
    the extremes and power 2.0 at the modal value, with some distribution
    in between. 
    
    Just some meandering thoughts.
    
    					/Dwayne
    
1042.5Why not to use least-squares.CADSYS::COOPERTopher CooperTue Mar 21 1989 16:5920
RE: .4
    
    Least squares is justified when certain conditions are met, which is
    not the case here.  Basically the errors on each point must be
    approximately normally distributed and the variance for each must be
    the same.  The first might be met (except that the proportions are
    so small that I wouldn't bet (so to speak) on it).  My intuition says
    that the second condition is *not* met, so you would have to use
    an appropriate weighted least-squares.  I think Bayes theorem is
    the way to go (Bayes theorem is a theorem in probability theory which
    can be interpreted to provide the probability that something (H) is
    true given some piece of evidence (E) given the probability that E
    will occur if H is true and the probability that H is true before
    you have taken account of evidence E; it thus allows you to build
    up evidence incrementally about something;  If I get a chance, and
    no one beats me to it, I'll give more detail.  The interpretation
    of Bayes theorem stands at the center of the largest, longest-running
    controversy in statistics and probability).
    
    						Topher
1042.6Just some meandering answersPOOL::HALLYBThe Smart Money was on GoliathTue Mar 21 1989 18:2730
1042.7Why all the estimates are bogus.CADSYS::COOPERTopher CooperWed Mar 22 1989 15:3830
    Forgot to mention yesterday.  All these attempts at estimation are
    predicated on one very bad assumption -- that each ticket represents a
    random, uniform, independent sample from the set of 6-tuples-without-
    replacement.  People actually cluster quite heavily due to various
    psychological reasons -- essentially that people's intuition about
    statistics and probabilities are grossly wrong.  People for example
    tend to feel that even numbers are in some sense "less random" than
    odd numbers and so are less likely to be drawn in a random sample;
    they therefore choose many more odd numbers than even.  People also
    tend to believe that "obvious" arithmetic progressions are less likely
    to occur than something more patternless and so tend to steer clear of
    those.
    
    To see why this throws the estimates off -- imagine that 97% of the
    tickets were for the same 6 numbers, none of which happend to be drawn.
    In that case your estimation effort would be only estimating the 3%
    of randomly drawn tickets.
    
    If you're approaching this as simply an interesting abstract puzzle
    inspired by the lottery, and are thus willing to arbitrarily specify
    uniform betting, than we can continue.
    
    If on the other hand you're actually curious about the answer, than
    I suggest you call the State Lotto Commission (or whatever its called
    there) for the answer.  Alternately, if you know the structure of
    the payoff system (i.e., how much gets skimmed by the state and how
    the remains are distributed among the various classes of winners)
    then you should be able to get a precise answer from the payoffs.
    
    					Topher
1042.81 2 3 4 5 6DEC25::ROBERTSReason, Purpose, Self-esteemWed Mar 22 1989 16:3834
    Thanks, Topher.
    
    Actually, my interest is both abstract and practical. The math is
    fun, but I sometimes play the game, myself.
    
    Your point is well taken. I doubt that 97% were for the same 6 numbers
    (as I'm sure you do, too), and wonder what the average Joe's "random"
    distribution really looks like.
    
    You said in -.1, "People also tend to believe that "obvious" arithmetic
    progressions are less likely to occur than something more patternless
    and so tend to steer clear of those." This is evidentially true. I
    observed a man choosing his "random" numbers by asking his
    pre-school-aged son for numbers. 
    
    "Give me a number, son."
    "One."
    "OK. Now give me another."
    "One."
    "No, no, no. It's got to be different."
    "Two."
    "Well, OK. Give me another."
    "Three."
    "Now look, son. The odds of getting three numbers in a row are almost
    zero. Try another number other than `three'."
    "Four."
    
    At this point, the man started yelling at his kid, picked him up and
    virtually threw him into his empty shopping cart. 
    
    For what it's worth.
    
    					/Dwayne
    
1042.9Is it a good bogus or a bad bogus?POOL::HALLYBThe Smart Money was on GoliathWed Mar 22 1989 16:5016
.7>    replacement.  People actually cluster quite heavily due to various
.7>    psychological reasons -- essentially that people's intuition about
.7>    statistics and probabilities are grossly wrong.  People for example
.7>    tend to feel that even numbers are in some sense "less random" than
    
    Would it be possible to account for this by looking at historical
    records of how often each number has been selected by players?
    
    If we assume, for the sake of argument, that numbers are selected in
    inverse proportion to their value (1 most often, 2 next, ... 42 last),
    and we know the 6 winning numbers and the outcomes as provided in .0 by
    Dwayne, can we then make an estimate of the number of tickets sold?
    It would be interesting to see how it compares with the "non-adjusted"
    values already guesstimated.
    
      John
1042.10AITG::DERAMODaniel V. {AITG,ZFC}:: D'EramoWed Mar 22 1989 19:147
	There's another "minor" skewing from the fact that one person
	buying two tickets most likely chooses different combinations
	on them [unless he doesn't like sharing].  If they were drawn
	randomly the two would be the same combination with probability
	equal to the probability of winning.

	Dan
1042.11Guaranteed Winner!DEC25::ROBERTSReason, Purpose, Self-esteemWed Mar 22 1989 20:4510
    A related question:
    
    What's the minimum number of tickets I must buy to guarantee I'll win
    at least one 3-out-of-6 prize? 4-out-of-6? 5-out-of-6? 
    
    6-out-of-6 is easy: C(42,6)=5245786; i.e., the number of combinations
    of 6 items out of 42. 
    
    					/Dwayne
    
1042.12BEING::POSTPISCHILAlways mount a scratch monkey.Thu Mar 23 1989 11:1611
    Re .7:
    
    > People also tend to believe that "obvious" arithmetic progressions
    > are less likely to occur than something more patternless and so tend to
    > steer clear of those.
    
    Actually, arithmetic progressions are the most frequently chosen
    tickets.
    
    
    				-- edp 
1042.13okay, so suppose it is ideal...KOBAL::GILBERTOwnership ObligatesThu Mar 23 1989 12:1223
    Let's restate the problem.
    
    A number of independent tests are done.  The result of each test is
    a non-negative number t; t occurs with probability P[t].
    
    After the tests, the number of tests that resulted in t is S[t].
    Given the P[t] values and a subset of the S[t] values, determine
    the probability that there were exactly N tests.
    
    
    For example, let P[0] + P[1] = 1, and suppose we are given S[0] = s0.
    Then:
    
    	Prob( N=s0+k | S[0]=s0 ) = Prob( N=S[0]+k & S[0]=s0 ) / Prob( S[0]=s0 )
    
    				      s0     k
    		    C( s0+k, s0 ) P[0]   P[1]
    		= ------------------------------ ;
    		  inf			s0     i
    		  Sum C( s0+i, s0 ) P[0]   P[1]
    		  i=0

    where C(a,b) is the binomial coefficient: 'a choose b'.
1042.14$4M split 19,412 ways is, um,POOL::HALLYBThe Smart Money was on GoliathThu Mar 23 1989 12:1433
1042.15Didn't mean to overgeneralize my examples.CADSYS::COOPERTopher CooperThu Mar 23 1989 14:4226
.12 (edp) .14 (HALLYB):
    
    Interesting.
    
    I reread my note .7 and found that I gave an impression of being more
    specific than I meant to.  I should have made clear that my examples
    of people's tendency to "cluster" was taken from other contexts and
    could not be blindly applied to this kind of lottery.  They were meant
    only as an example of the type of clustering that can occur when people
    try to be random.  Complicating things when you are talking about
    a lottery like this is that people use different strategies -- some
    people try to guess a "most random" number, some people use dice or
    some other device to get a number (I know from the ads that in Mass.
    there is now a service for this -- you can request a random number be
    chosen for you rather than you supplying one), while other people use
    various systems which can produce highly patterned results (e.g.,
    betting columns on the sheet).
    
    A friend of mine told me about when he worked on the Hong Kong lottery
    (they used DEC computers).  The Hong Kong lottery was *not* parimutual.
    One day one of the major newspapers displayed a picture of a car wreck
    on the front page with a prominantly displayed lisence plate number.
    Thousands bet on it, and it came in -- the lottery commision went
    bankrupt.
    
    					Topher
1042.16DEC25::ROBERTSReason, Purpose, Self-esteemFri Mar 24 1989 13:3221
    RE: my own note 1042.11 (What's the minimum number of tickets I
    must buy to guarantee I'll win at least one n-out-of-6 prize?)
    
    n		minimum
    =		=======
    0		0
    1		7 (  1  2  3  4  5  6 )
    		  (  7  8  9 10 11 12 )
    		  ( 13 14 15 16 17 18 )
    		  ( 19 20 21 22 23 24 )
    		  ( 25 26 27 28 29 30 )
    		  ( 31 32 33 34 35 36 )
    		  ( 37 38 39 40 41 42 )
    2		91 ?
    3		1330 ?
    4		?
    5		?
    6		5245786
    
    				/Dwayne
    
1042.17KOBAL::GILBERTOwnership ObligatesFri Mar 24 1989 15:494
> What's the minimum number of tickets I must buy to guarantee
> I'll win at least one n-out-of-m prize?

See note 746.* for this particular subproblem.
1042.1850 cent tour of Bayesian StatisticsCADSYS::COOPERTopher CooperTue Mar 28 1989 18:19152
    Bayes' Law or Bayes' Theorem says:


			   Pr(E | Hx) * Pr(Hx)
	    Pr(Hx | E) = _________________________
			   ---
			   \
			    > Pr(E | Hi) * Pr(Hi)
			   /
			   ---
			    i

    Where

	Hx is a hypothesis.
	E is an event to be used as evidence about that hypothesis.
	Hi is any one of a set of complete (i.e., one of them has to be
	    true) and distinct (i.e., if any one of them is true the others
	    are all false) hypotheses which includes Hx.
	Pr(X) is the probability that X is true, and
	Pr(X|Y) is the probability that X is true given that Y is true.

    The summation is, of course, over all the Hi.

    If the description of the H's and E's had been that they were outcomes
    of experimental trials, then there would have been absolutely no
    controversy about the above.  It is simply an elementary, frequently
    useful, theorem of probability theory.  But with the descriptions
    I used, Bayes' Law is the basis of the controversial discipline called
    Bayesian Statistics.

    The root of the controversy has to do with the interpretation given
    to the concept of "probability".  In the traditional school of
    of statistics -- generally called the "frequentist" school --
    probability refers to the frequency of undistinguished events in
    a large number of identical repetitions of the same circumstances.
    It is meaningless, according to this interpretation, to talk about
    the probability of a general hypothesis, since they are unique and
    are thus either simply true or simply false.  At best one could give
    them probability values of 0 or 1 but no value between can be
    justified.  Similarly, if E is in some sense a unique event -- the
    outcome of a specific experiment, for example -- then it also cannot
    meaningfully have a probability value associated with it.  It either
    occurs or it doesn't occur.

    Bayesians, however, take a much broader view of probability.
    Essentially their view is that anything which has the proper
    mathematical form is "proper."  This allows Bayesians to use
    probability theory and statistics to model "rational uncertainty."

    Especially controversial is the common use of "subjective
    probabilities", i.e., expert judgments as to the probability of
    something.  Generally these are used for setting values for the
    initial prior probabilities (the probabilities of the form Pr(Hi)),
    but may also be used in estimating the conditional probabilities
    (Pr(E|Hi)).  Care must be taken that the subjective probabilities
    are "rational" (obey the mathematical rules of probabilities), e.g.,
    they must be normalized so that the sum of all the exclusive
    probabilities equals 1.

    You will often see "likelihood" used instead of "probability" by
    Bayesians, this is essentially just a dodge to avoid some of the heat
    from the mainstream about what a "probability value" is.

    The Bayesians justification for all this is: 1) it works, 2) it
    corresponds better to what people actually mean when they talk about
    probabilities -- e.g., scientists talk all the time about the
    probability that a particular theory is true, and 3) since subjective
    judgments are going to be made *anyway* they may as well be made
    explicit, formalized and made part of the process rather than done
    implicitly after the "statistics" are "finished".

    In practice, Bayesian statistics usually is concerned with evaluating
    not a single piece of evidence but a set of pieces of evidence.
    This is done by the simple expedient of applying the above formula
    to one of the pieces of evidence, using whatever prior probabilities
    seem justified.  The result of that process is then used as the
    prior probability in a second application of the formula using a
    second piece of evidence.  Its fairly easy to show that the order
    that one uses the evidence is irrelevant to the final outcome.  Less
    easy to show but true is that the values of the initial prior
    probabilities quickly become irrelevant as long as they aren't too
    extreme (e.g., if one of your prior probabilities is 0 it will remain
    0 no matter what -- if your mind is made up, no amount of evidence can
    be expected to change it).

    One important "cultural" difference between frequentist and Bayesian
    statistics is what is taken as a "point estimate" of a quantity.
    In traditional statistics, the most common value used is the
    "expectation", mean or average, though occasionally the median is used.
    Bayesians on the other hand, tend to use the "mode" the point of
    highest probability.

    For example, in note 1042.0 estimates of the number of lottery
    tickets bought was given with a fraction, even though only an integer
    number of tickets could have been purchased.  From a traditional
    statistical viewpoint, this is a reasonable thing to do, since the
    result sought is a sort of "summary" of the entire distribution of
    possible values.  Typically a Bayesian, however, would have given an
    integer value -- that which had the highest probability associated with
    it.

    Frequently one will see quite different seeming formula described as
    Bayes' Law.  These are either equivalent or are derived from the above.
    One particularly useful version is gotten by dividing the formula
    for two different hypotheses by each other, and using some defined
    quantities:

				 Pr(E | H1)
		R(H1, H2 | E) = ------------ * R(H1, H2)
				 Pr(E | H2)

	Where:
	    Pr(X | Y) is as before,
	    R(Hx, Hy) is the relative likelihood (probability) of Hx to Hy
		    (e.g., Hx is four times more likely than Hy).
	    R(Hx, Hy | E) is the relative likelihood of Hx to Hy given that
		    E occurred.

    Note that in this form one can completely ignore the probabilities, or
    even the existence of other hypotheses than those you are examining.

    Two identities (actually two forms of the same identity) which are
    useful with this formula in extending it to more than two hypotheses
    are:

		R(H1, H2) = R(H1, H3) / R(H2, H3)

    and

		R(H1, H3) = R(H1, H2) * R(H2, H3)

    Why are Bayesian statistics not more widely used despite their obvious
    intuitive appeal?  Obviously the rather academic objections of the
    traditional school (even if valid) are unlikely to interfere with
    people "in the trenches" using them if they know about them and they
    wish to use them.  Bayesian statistics *are* widely known about --
    though they do not represent the mainstream, they are not the province
    of only a few isolated "nuts".  Many elementary and most general
    intermediate statistics books contain a chapter on Bayesian statistics.

    So they seem desirable, and they are known about, so how come they are
    not used?  The main answer is that they are a pain in the butt.
    Bayesian statistics require careful, explicit judgments to be made at
    each stage.  They are not easily captured into cookbook procedures such
    as are the mainstay of traditional statistics.  It is much harder to
    calculate the postori probability of a hypothesis properly by using
    Bayesian procedures than to calculate a "p-value" using traditional
    procedures and then to effectively treat it (improperly) as the
    probability of a hypothesis.

				    Topher
1042.19a Bayesian calculationPULSAR::WALLYWally Neilsen-SteinhardtTue Apr 18 1989 18:0276
    At least one other discussion of applying Bayes' Law appears in
    note 831.  I think you would have had to do DIR/TITLE=BAYES *.*
    to find it.  There may be others, but I don't remember where.
    
    Reply 1042.18 gave the form of Bayes' law that we need, so I will
    just carry out two calculations as an example.  I will make the
    (unrealistic) assumption that ticket numbers were uniformly
    distributed. 
    
    An easy way to rephrase this question is as follows: I know the
    probability with which some condition is satisfied, and I know the
    actual number of times that the condition was satisfied in a sample,
    so what is my best estimate of the number of events in the sample?
    
    The probabilities of the conditions are calculated as fractions 
    in .0, and the actual numbers are given there.  I will use the form
    of Bayes' law in .18, but I will ignore the denominator, since I
    know that it is just a normalizing factor that I can calculate when
    I need it.  So my form will be
    
    	Pr( Hx | E ) = Pr( E | Hx ) * Pr( Hx ) / D
    
    The Pr( E | Hx ) on the right is the probability that the evidence
    will be seen, given the hypothesis, or in this case, the probability
    of seeing r conditions satisfied in n events, given that the
    probability of seeing the condition in one event is f.  This is
    given by the binomial probability distribution or
    
    	Pr( r | n f ) = C( n, r ) * f^n * ( 1-f )^( n-r )
    
    	[ note the similarity to .13 ]
    
    Consider first the special case of the 6 digit match, where r = 0, and
    f = 1/5245786.  Here Pr( r | n f ) simplifies to
    
    	Pr( 0 | n f ) = ( 1 - f )^n
    
    To put this into Bayes' law, we need a value for Pr( Hx ), the
    probability we assign to n before we have any results.  Conventionally
    we assume some large upper limit, say 10^8 and call it nmax, is more 
    tickets than the Colorado lottery can sell, and that in our prior 
    state of knowledge we can only assume that all ns <= nmax are equally
    likely, so Pr( n ) = 1/nmax.  (note that those who do not like the
    Bayesian approach usually start jumping up and down and shouting
    about here).  Substituting this into Bayes' Law gives
    
    	Pr( n | E ) = ( 1 - f )^n / (D*nmax)
    
    Because f is so small, this probability goes slowly to zero as n
    increases.  In other words, the six digit match does not provide
    you with much information, as suggested in .2.  About all it tells
    you is that n is likely to be less than about 10^7.
    
    So let's go to the other end and look at the three digit match.
    Here it is useful to note, before we get all tied up in computing
    factorials, that for large n and r, the binomial distribution is
    well approximated by a normal distribution with mean = n * f and 
    variance = n * f * (1-f).  Substituting in the values in .0 tells
    us that the three digit match gives us a most likely value of 754761
    with a standard deviation of 141.  
    
    Similarly the four digit match gives us a most likely value of 698346
    with a standard deviation of 35.  The fact that these two most likely
    values are so many standard deviations apart confirms what we
    suspected: lotto numbers are not randomly chosen.  So we don't get
    to use the cascading process which is traditional in Bayesian analysis:
    use the six digit matches to get a first estimate, then refine it
    with the five digit matches and so forth.
    
    Note finally that the least squares approach in .0 has been shown
    to be inapplicable.  Because the variances differ, we should not
    combine the values in a simple least squares.  If our assumption 
    of randomly chosen numbers had been verified, we could have tried 
    a weighted least squares approach as the 'classical' solution.  Since 
    our assumption was not verified, we must confess that we know almost 
    nothing about the number of tickets sold.