[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1210.0. "Median, Mean, & Standard Deviation questions" by TIXEL::ARNOLD (Real men don't set for stun) Tue Mar 13 1990 18:29

    OK, math-mongers, here's an easy one for you, probably unbefitting
    your mathematical know-how, but something I've got to ask anyway for
    something I'm working on, and finding that I've been away from my
    high school/college math for too long.

    Suppose you have 'n' data points; for example:

       5   6   1   3   14   4   11   3   8   3   5

    There are a variable number ('n') of these data points.  Figuring the
    AVERAGE is easy.  (I haven't been away from it *that* long!)  But is
    there a mathematical formula to figure the MEDIAN, the MEAN, and the
    STANDARD DEVIATION of these data points?  That's where I get lost.

    Any help you can offer here would be greatly appreciated.
    Thanks
    Jon
T.RTitleUserPersonal
Name
DateLines
1210.1A little info ...COOKIE::PBERGHPeter Bergh, DTN 523-3007Tue Mar 13 1990 20:1038
       <<< Note 1210.0 by TIXEL::ARNOLD "Real men don't set for stun" >>>
               -< Median, Mean, & Standard Deviation questions >-

    >> Suppose you have 'n' data points; for example:

    >>    5   6   1   3   14   4   11   3   8   3   5

    >> There are a variable number ('n') of these data points.  Figuring the
    >> AVERAGE is easy.  (I haven't been away from it *that* long!)  But is
    >> there a mathematical formula to figure the MEDIAN, the MEAN, and the
    >> STANDARD DEVIATION of these data points?  That's where I get lost.
    
    	The median requires you to sort the data points (in ascending or
    descending order, whichever you fancy) and pick the middle one; that is
    the median.  If n is even, there is no middle one, so one normally
    takes the average of the two middle ones and call that average the
    median.
    
    	The mean is, if memory serves, the same as the average.
    
    	The standard deviation you calculate (*in double precision*, if n is
    large) as
    
    		SQRT((SUM(X(i)*X(i)) - SUM(X(i))*SUM(X(i))/N)/(N-1))
    
    (this formula only requires one pass over the data, since you can
    calculate SUM(X(i)) and SUM(X(i)*X(i)) into two different scalars).
    
    	The reason you need double precision for large N is that
    SUM(X(i)*X(i)) frequently is very close to SUM(X(i))*SUM(X(i))/N.
    
    	WARNING: Be very careful about plugging means and standard
    deviations into formulae, because many of the standard formulae assume
    that the underlying distribution is normal (i.e., that the X(i), if
    plotted in a histogram, look like a bell curve) and may lead you astray
    if this assumption is not satisfied.  *Before* you plug values into a
    standard formula that assumes normality, plot a histogram and verify
    that it looks reasonably like a bell curve.