| <<< Note 1210.0 by TIXEL::ARNOLD "Real men don't set for stun" >>>
-< Median, Mean, & Standard Deviation questions >-
>> Suppose you have 'n' data points; for example:
>> 5 6 1 3 14 4 11 3 8 3 5
>> There are a variable number ('n') of these data points. Figuring the
>> AVERAGE is easy. (I haven't been away from it *that* long!) But is
>> there a mathematical formula to figure the MEDIAN, the MEAN, and the
>> STANDARD DEVIATION of these data points? That's where I get lost.
The median requires you to sort the data points (in ascending or
descending order, whichever you fancy) and pick the middle one; that is
the median. If n is even, there is no middle one, so one normally
takes the average of the two middle ones and call that average the
median.
The mean is, if memory serves, the same as the average.
The standard deviation you calculate (*in double precision*, if n is
large) as
SQRT((SUM(X(i)*X(i)) - SUM(X(i))*SUM(X(i))/N)/(N-1))
(this formula only requires one pass over the data, since you can
calculate SUM(X(i)) and SUM(X(i)*X(i)) into two different scalars).
The reason you need double precision for large N is that
SUM(X(i)*X(i)) frequently is very close to SUM(X(i))*SUM(X(i))/N.
WARNING: Be very careful about plugging means and standard
deviations into formulae, because many of the standard formulae assume
that the underlying distribution is normal (i.e., that the X(i), if
plotted in a histogram, look like a bell curve) and may lead you astray
if this assumption is not satisfied. *Before* you plug values into a
standard formula that assumes normality, plot a histogram and verify
that it looks reasonably like a bell curve.
|