[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

985.0. "Election Distribution" by BEING::POSTPISCHIL (Always mount a scratch monkey.) Mon Dec 05 1988 18:42

    Here's a set of percentages of votes cast in each state (and the
    District of Columbia) in a national election.  Perhaps we should throw
    out D.C. as an exception.  There are two columns, being votes cast for
    one of two choices. There were other minor choices; data from them has
    been ignored in computing the percentages. 

    In the center of the range, note this frequency of samples occurring in
    these one-percent ranges (from the left column; the right column is
    100% minus the left column): 

	sub-range	number
	53 to 54	2 **
	52 to 53	3 ***
	51 to 52	2 **
	50 to 51	0 
	49 to 50	0 
	48 to 49	6 ******
	47 to 48	2 **
	46 to 47	3 ***
	45 to 46	1 *

    What can we say about the probability of these samples having been
    arrived at by taking samples from some more even distribution and
    adjusting some of the samples?  What statistical tests can be applied? 
                                                                          
DIS.COLUMBIA   85.61108      14.38892
RHODEISLAND    56.07379      43.92621
IOWA           55.19371      44.80629
HAWAII         54.80628      45.19373
MASS.          53.94824      46.05177
MINNESOTA      53.60203      46.39798
OREGON         52.62353      47.37647
W.VIRGINIA     52.41817      47.58184
NEWYORK        52.03937      47.96063
WISCONSIN      51.81251      48.18749
WASHINGTON     51.34891      48.65109
PENN.          48.80041      51.1996
MARYLAND       48.76353      51.23648
ILLINOIS       48.68293      51.31707
VERMONT        48.59194      51.40806
CALIFORNIA     48.32645      51.67355
MISSOURI       48.15071      51.8493
NEWMEXICO      47.56053      52.43948
CONNECTICUT    47.46092      52.53908
MONTANA        46.99792      53.00209
S.DAKOTA       46.80474      53.19527
COLORADO       46.05035      53.94965
MICHIGAN       45.93818      54.06183
LOUISIANA      44.82543      55.17457
OHIO           44.51544      55.48456
KENTUCKY       44.18671      55.81329
MAINE          44.16273      55.83728
TEXAS          43.61381      56.38619
N.DAKOTA       43.43195      56.56806
KANSAS         43.30046      56.69955
DELAWARE       43.24046      56.75954
NEWJERSEY      42.86363      57.13637
ARKANSAS       42.66707      57.33293
N.CAROLINA     41.93989      58.06012
TENNESSEE      41.90802      58.09199
OKLAHOMA       41.61208      58.38792
ALABAMA        40.33478      59.66522
GEORGIA        40.07535      59.92466
INDIANA        39.92463      60.07537
VIRGINIA       39.74181      60.2582
MISSISSIPPI    39.54387      60.45613
NEBRASKA       39.51819      60.48181
NEVADA         39.18868      60.81132
ARIZONA        39.17765      60.82236
FLORIDA        39.12862      60.87138
WYOMING        38.57417      61.42584
S.CAROLINA     37.99027      62.00974
ALASKA         37.79483      62.20517
IDAHO          36.77346      63.22655
N.HAMPSHIRE    36.71866      63.28135
UTAH           32.64154      67.35847
T.RTitleUserPersonal
Name
DateLines
985.1analysing election resultsPULSAR::WALLYWally Neilsen-SteinhardtThu Dec 08 1988 15:0841
re: < Note 985.0 by BEING::POSTPISCHIL "Always mount a scratch monkey." >
                           -< Election Distribution >-

>    What can we say about the probability of these samples having been
>    arrived at by taking samples from some more even distribution and
>    adjusting some of the samples?  What statistical tests can be
>    applied? 
    
    Nothing can be said and no tests can be applied because your hypotheses
    are too vague.  You need to begin by stating more precisely some
    hypotheses, and then tests can be formulated for them.  Just to
    give a trivial example:
    
    	H0: these values are randomly drawn from a population characterized
    	by a normal distribution with some unknown mean and variance
    
    Any of the usual goodness of fit tests would convince anyone who
    did not trust their eyes.
    
    Note that the statistical analysis of election results is quite
    a cottage industry, heavily supported by the media, the parties
    and the pols.  You can see the output all over the place, and infer
    from it some of the techniques being used.  I've never been on the
    inside, so what follows is just inference.  Has anybody reading
    this actually done or supported this work?
    
    The standard technique is to assume that the vote in a state is
    a function of current national economic, political, social and cultural
    factors, specific statewide or regional economic, political, social 
    and cultural characteristics and other unpredictable factors.  Various
    hypotheses are formulated to express these functions.  For example, 
    urban industrial states tend to vote Democratic unless there is 
    unusual prosperity or a major foreign threat.  Note that the
    distribution in .0 is consistent with this kind of assumption.
    
    Various means are used to assign numerical values to all these
    factors and characteristics, and factor analysis is used to test
    hypotheses.  The result is a set of more-or-less well confirmed
    statements relating voting outcomes to current factors and local
    characterisitics.  The results I've seen are not too impressive,
    but they burn MIPS and keep the pols out of worse trouble.
985.2Law of Cubic ProportionsAUSSIE::GARSONnouveau pauvreMon Aug 16 1993 22:5333
985.3a related questionHERON::BUCHANANThe was not found.Tue Aug 17 1993 09:096
	Daryll Huff, in his classic "How to Lie with Statistics", poses the
question:  given that party A beats party B by a votes to b, what is the
probability that party A is ahead of party B *throughout* the process of 
counting votes?   He states (without proof) that the answer is (a-b)/(a+b).

Andrew.
985.4AUSSIE::GARSONnouveau pauvreWed Aug 18 1993 22:054
    re .-1
    
    Do you have a page or chapter reference for that? I quickly flipped
    through my dusty copy and couldn't find it.
985.5the proof of Daryll Huff statementGVAADG::DUBEFri Aug 27 1993 09:15100
Re 985.3

>> 	Daryll Huff, in his classic "How to Lie with Statistics", poses the
>> question:  given that party A beats party B by a votes to b, what is the
>> probability that party A is ahead of party B *throughout* the process of 
>> counting votes?   He states (without proof) that the answer is (a-b)/(a+b).


The process of "counting votes" can be shown in a diagram where the coordinates
represent : 
	. x : the sum of the counted votes at a specific time 
        . y : the difference between the 2 parties A and B


  a-b
   |
   |
   |
   |                  End
   |                 / 
   |              /\/
   | /\          /
   |/  \      /\/
   +____\/\__/____________________________ a+b
           \/

                                                              
 The number of distinct paths from Origin to End is equal to the number 
 of distinct permutations of a+b votes, among which "a" votes are identical :

      distinct paths from Origin to End = C ( a+b, a )          
                                        = (a+b)! / (a! * b!)   

      let's call N(a,b) that quantity, so we have :

            N(a,b) = (a+b)! / (a! * b!)			[1]

  All these paths have equal probability of occurring. So the probability
  of each path is equal to 1 / N(a,b)


  In order to always have a > b during the process, the paths must
  go via the point X ( first vote good for Party A ) :

  a-b
   |
   |
   |
   |                  . End
   |     
   |     
   | X
   |/    
   +______________________________________ a+b
    \ 
     Y

  The probability of first vote being good for A is equal to 

        P1 = a/(a+b)

  At point X, there remain in the box a-1 votes for party A, and b votes for
  party B. So the overall number of paths going from X to End can be deduced
  from relation [1] for N (a-1,b). We get

        N (a-1,b) = (a+b-1)! / ( (a-1)! * b! )		[2]

  Then, during the next votes, the path may not fall onto the x axis.
  Just by symmetry, there are as many paths crossing the axis from point Y,
  as there are paths coming from X, and falling then onto the x axis.

  So the number of paths coming from X to fall onto the axis, is equal to the 
  number of paths going from Y to End. So, using the relation [1] above
  for N (a,b-1), we get the number of paths falling from X onto the axis :

        N (a,b-1) = (a+b-1)! / ( a! * (b-1)! )		[3]


  So, the probability to always have a > b during the process is :

	P = P1 * [N (a-1,b) - N (a,b-1)] / [N (a-1,b)]
     
          |...| |.......................| |...........|
             
           must  paths from X which don't   paths from
           go    fall onto the axis         X to End
           via
           point
           X   


  We finally get :
  
        P = (a-b)/(a+b)


  Friendly,

  ##### Remy #####