[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1886.0. "Matching Birthdays/Conditional Probability" by STOHUB::SLBLUZ::BROCKUS (I'm the NRA!) Fri Aug 05 1994 17:52

I understood this in High School, and I took a probability and statistics
course in college, but somehow, since I haven't used it in 20 years,
I just can't make it work out.

I searched titles for "birthday" and "probability", and came to a dead end.

I want to work out that old chestnut, the birthday problem.  You know, the
one where you have 20 people in a room and you claim "I bet at least 2 of
the people in this room have the same birthday."

I went at it like this ( I guess the lines beginning with !!! are wrong...):

Proposition A:  Everyone in the room has a unique birthday.

P(A), the probability of A depends on n, the number of people.  So write it
as P(An), where n is a number.

P(A0), by definition = 1
P(A1), by definition = 1
P(An|An-1) = 1 - (n-1)/365
    Conditional on no matches so far, How many days of the year are left?
    same as:  (366-n)/365

!!! P(An) = P(An|An-1) * P(An-1)
!!! P(An) = P(An|An-1) * P(An-1|An-2) * ... * P(A1)

And I wrote a little C program to figure it out.  Here is the output:

freedays = 366-n
cond prob = P(An|An-1)
prob = P(An), as above

$ r birthday
Num people    Freedays  cond prob.        prob

         2         364    0.997260    0.997260
         3         363    0.994521    0.991796
         4         362    0.991781    0.983644
         5         361    0.989041    0.972864
         6         360    0.986301    0.959537
         7         359    0.983562    0.943764
         8         358    0.980822    0.925665
         9         357    0.978082    0.905376
        10         356    0.975342    0.883052
        11         355    0.972603    0.858858
        12         354    0.969863    0.832975
        13         353    0.967123    0.805590
        14         352    0.964384    0.776897
        15         351    0.961644    0.747099
        16         350    0.958904    0.716396
        17         349    0.956164    0.684992
        18         348    0.953425    0.653088
        19         347    0.950685    0.620881
        20         346    0.947945    0.588561
        21         345    0.945205    0.556311
        22         344    0.942466    0.524305
        23         343    0.939726    0.492703
        24         342    0.936986    0.461656

All this is well and good, but as I recall, my probability figure is way off.
As I remember it, your chances for a match are pretty good at about 15 people,
and near certain at 20, and my chart doesn't reflect that.

So I've goofed.  Could some kind soul give me a brief, clear description of
how to solve simple conditional probability problems like this, so I can
find the error of my ways?

Thanks,

JPB

T.RTitleUserPersonal
Name
DateLines
1886.1You got itVMSDEV::HALLYBFish have no concept of fireFri Aug 05 1994 20:089
    I believe your calculations are correct. The 50% mark is between 
    22 and 23, exactly as presented.
    
    There's an old (true) story about this, where 22 people are having a
    lunch. They go around the table with no duplications. Some doubts are
    raised about the calculations. The waitress (person #23) chimes in, 
    saying her birthday is the same as "the general's over there".
    
      John
1886.2Coincidences and Pepsi Diet ColaMOVIES::GREENUse Tabasco as GravyWed Aug 10 1994 13:1044
    About a year ago I went to a public lecture on coincidences given in
    Edinburgh given by  Persi Diaconis, a well known and outspoken
    statistician.
    
    The talk covered a number of "coincidences" including discussion around
    the birthday paradox (.0), "psychic" tricks and multiple winners of
    national/state lotteries.  Diaconis analysed the problems debunking
    their coincidental appearance. 
    
    He gave a few interesting "rules of thumb" around general category
    selection problems (of which the birthday paradox is one).  The problem
    is for a given set of randomly chosen events, the category size (C = #
    of events - e.g. 365 for birthday paradox), number of event matches (n
    = 2 for b.p.), determine a sample set size (S) to achieve a specified
    probability  of match (p = 0.50 for b.p.)
    
    He gave a few approximate formulae, two of which are:
    
    S = 1.2 sqrt(C)    defines S for p = 0.50 and n = 2
    S = 1.6 sqrt(C)                      0.75     n = 2
    
    
    [both of these fit the birthday paradox quite well]
    
    Interestingly the analysis as in .0 (and these approximations) give
    *worst case* set sizes.  Very few category sets are randomly chosen -
    e.g. depending on latitude, power cuts etc. birth dates are skewed,
    another example (how many people do you need in a room to have 50%
    chance of having 2 of them with the same PIN code # for a card).  The
    120 from formula above is conservative - there's a large set of PINs
    unused - all the "obvious" passwords such as dddd (e.g. 3333), simple
    sequences 1234 etc.
    
    He also gave a formula for determining S from all 3 variables.
    (I'll add this if I find it ...)
    
    The single variable case seems simple enough to curve fit over a
    reasonable category range.  The more general case possibly done
    similarly with larger sets.  
    
    Anyone seen similar treatment of this problem?
    
    Russ
    
1886.3Capitol Hill birthday problemEVMS::HALLYBFish have no concept of fireTue Nov 29 1994 13:1913
1886.4a naive answerCSSE::NEILSENWally Neilsen-SteinhardtTue Nov 29 1994 15:3645
Consider a given day and a given congressperson, defined as a member of the US
Seante or the House of Representatives.  The probability that this day is their 
birthday is 

	1/365

The probability that this day is not their birthday is

	1 - 1/365 = 0.9972...

To get the probability that this is no congressperson's birthday, multiply
the factor above by itself 535 times

	( 1 - 1/365 )^535 = 0.2276...

The probability that this date is one or more congressperson's birthday,
is given when you subtract this from 1, to get .7723...

The probability that all 365 days are the birthday of one or more 
congresspersons is given by multiplying this by itself 365 times, to get

	0. forty zeroes 11...

pretty close to zero.

The expected number of days which are the birthday of no congressperson is
given by 

	0.2276... * 365 or about 83 days.


Why is this a naive answer?


It assumes, in the last two steps, that the probabilities for given days are
independent.  This is probably a good approximation, since the numbers 
involved are greater than 30 (old statistical rule of thumb), but it cannot
be exact.  If I check a hundred days at random, and find that they are all
the birthday of no congressperson, then all 535 birthdays must be somewhere 
in the remaining 265 days, so the probability I would compute for the 
remaining 265 days would be larger.

I have seen the correct answer worked out for a related problem in statistical
sampling.  It involves the hypergeometric distribution, as I remember.  Almost
everyone solves the approximate problem, as above.
1886.5Kind of an inverse of the Birthday ProblemSSAG::LARYLaughter & hope & a sock in the eyeWed Nov 30 1994 03:3811
In fact, it would take close to 2300 senators and congressmen (no thanks!) to
raise the probability of total birthday coverage to 50%; a general formula for
approximating this is that given M pigeonholes the number of uniformly
distributed pigeons it would take to yield a probability P of covering them
all is approximately

	M (ln M - ln (-ln P))

The approximation is pretty good for largish M and non-infinitesimal P, it
yields 2287 politicians in my above example which worked back through Wally's
formula yields a 50.2% chance of total coverage, not bad...
1886.6RUSURE::EDPAlways mount a scratch monkey.Thu Dec 01 1994 18:38103
    Re .4:
    
> doit:=proc(m,n)
>    local prob;
>    prob:=array(-1..m);
>    prob[-1]:=0;
>    prob[0]:=1;
>    for j from 1 to m do
>       prob[j]:=0;
>    od;
>    for i from 1 to n do
>       for j from min(m,i) to 0 by -1 do
>          prob[j]:=j*prob[j]+(m-j+1)*prob[j-1];
>       od;
>    od;
>    expected:=0;
>    for j from 0 to m do\
>       prob[j]:=prob[j]/m^n;
>       expected:=expected+(m-j)*prob[j];
>    od;
>    RETURN(prob[m], expected)\
> end:
> doit(365,535); evalf(");

1323440844407362116627960302003335275332962169944678059813232750927454120226366\
2766478374395771188579394375511380182546429354059259560605933394697851876099802\
7149307970121352959764954356707239524186290672542755931940198103069300444989795\
5280654094722662599180361164809958704837742551130023619864831377941708582894434\
2762569753869191806549789810416971820002008033186108574742629307313125823273321\
0676273819427178628280206657665527071374844772280524998369456267567945659274936\
2347974478887949269643847042942369062045560052061797610912143314447807496245525\
6177799563769915527806894167788462669279853345833723948922632769439795515469746\
4265578841250212555855055785843252684312026935063863053014887925685826277669209\
3053906229588586480476512358335035272810560200828710057817261603298027947279916\
8799145026774090666286890481876624306412902415799827002409464305512913255978634\
7483757844122147478437889765458993155116727735682348940747663664519706552192515\
4859681519059147199042849821341491904719817984186686421284914141333648580692462\
7484170458107149990442242973927480349757238321736363107877241820790046427878988\
0550710044706330646887866185161363663157812627646651509058712475232656341130730\
0984270529175705683256520792547683552789943610769408
--------------------------------------------------------------------------------
2195355842706651752979008092637435350328368968456675092965258496232370203649216\
8784115620170114369965023841797222804646493677451778478280920454354800469624213\
0811433829023550400486047503651345163293552215640730934016271601829514878787556\
0033354810885709210581127029593268167324198637227044382836721498575514296591151\
7373246508649855824674903030294630538943878860805220031010315272408978201333366\
6323818578022406459224819423733960942639078654552485545133982174657741547588061\
4925303700891221367548002154358788452204918953350285659599465027298041693365040\
4816779874001923988101270902942608026752807720218848713432355672937439745664089\
4942223373168039819013225359890586311338585581815182361274721968144136655187381\
4546475244033705476190751461290238970389577614186501705567754050626761465683162\
8892411042047614678033315661490675340619094265487283399636262135115843372533256\
2415303713766057341765346994190395272749936584799648815506919944503553009095667\
7399217951082517266225019165933827528666013165535485160228110198140457931472244\
3438515165951323330763021084550362532861693398856240810363506810186579031601109\
8389626255639743651994988240829341455279537096720373121806462672634279808232425\
5182176819316035631626884969977458432104666027288995700861574928943647025403329\
56459678825922310352325439453125

    ,

1546109512699027926694822937831767064047670285602254959228293560767901974526137\
7986726124282691123551249035510342666067005562963435103797603570381197349330301\
8212462456211680593055590647776107840206290816260487581260457787515186325323240\
8566707253683077066929411167259410947260926931176276149362214566354642896394009\
3123104440999549958228215661805738420126945806542556760005024908997090448866746\
7013295599462370859213999885923283364886551308425011744096921767720905344836269\
9372589527003437376411011342213237404047621288195142012275013116621287899775345\
3764576885487884151080514321636650191473089013950699157264329991971289160452416\
9623582148087280798574850506887644810857352595196035310912402390864913447394515\
2324154801095371877031807701628778355959884746607215370824306420908838983089004\
7046208519741220383872382540979638977269466439679578041311102259462210530320301\
1340145368941735344108934154527766145168047196404297518815899718607626129565549\
9280553920968867586585641903489349754543513111649851819310933952493284990062185\
9383645519932781703112658511289165759997931493500833061787783973717475063603431\
2675664611124147032588009098286969393158927020477912889539924244085690427542517\
6614856603919951753885302336855162100127919969036674339503898734192006404885840\
3641707880990705887412030233333486584270037848479343456192213647184177694874177\
6810376359245683534042497024
--------------------------------------------------------------------------------
1838186701138184742512789778320580006498920715225300017329778463226814096670975\
7416367278386856520040949090937535665351602364817092339424962163920012964359389\
1675009457850271824973555341129380894276759678729825950332363181373829692471104\
0181486885964753361843282660988917882624753103798011380861057117650444663491206\
5671413950246942144017383334950299410142708445176937059471022654322597507669515\
8808396715860049689077344226842984628463423253921388436881334088978774913676193\
2579869642901153011329459756708151916736751095912715356841321474562320078280322\
7555717782140161223175725778540759447029990078105576372428848564387904095167838\
9422442166448785151174239027861194733334379247384277217869390846572911901239292\
3783696260777243500237332661696490301348166908514202631309001246649003758216682\
5267241756222852981491038579385719476783671487637807577326716549584876495340445\
4255990053117124502199348093249292929659781647169562483731320683416062134285140\
6906019443807682386034131507249114292721620571910662558273096453887428127689268\
4583269994820057337031123420678037743998537723111995544664267602204158310301483\
3550591241114944225378913192478260365710561690896363925876284443793550448595851\
2928189049234329255444422606179558315329657477720115059810410890422712050908250\
1079905387486933603176105142895319170526267459913378006235471851317897495903253\
00232507288455963134765625

                                       -59
                         .6028365965*10   , 84.11058092
> 
    
1886.7EVMS::HALLYBFish have no concept of fireFri Dec 02 1994 11:5912
>                                       -59
>                         .6028365965*10   , 84.11058092
    
    
    [1] How many times does ^^^^^ go into Avogadro's number?
    
    [2] Was this done on a Pentium?  :-)
    
    I'm quite certain these numbers are correct (thanks, Eric) but I find
    them highly nonintuitive, especially in light of the birthday problem.
    
      John