[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

652.0. "standard deviation" by SKYLRK::RICHARD () Mon Jan 19 1987 18:03

    Here is an easy one.
    
    What is the formula for the standard deviation for n samples?
    I would look it up except as a general rule I give or throw away
    books I haven't looked at in a year and there are no math books
    in the sales library.
    
    Thank you,
    
    Gregory
T.RTitleUserPersonal
Name
DateLines
652.1variance^1/2MODEL::YARBROUGHMon Jan 19 1987 18:5912
>    What is the formula for the standard deviation for n samples?

It's the square root of the mean of the squares of the differences between 
the observed values and the population mean.

pop.mean = (sum(1..n) x[i])/n

variance = (sum(1..n) (pop.mean-x[i])^2)/n

std. dev. = sqrt (variance)

Caveat: this calculation is subject to severe rounding errors.
652.2another versionESTORE::ROOSTue Jan 20 1987 17:3916
    Two things:
    
    1. Concerning .1's reply:     The standard deviation for the population
       has an n in the denominator, but the standard deviation for a
       sample of the population has a n-1 in the denominator.
    
    2. Another version for S.D.:
    
       variance = (sum(1..n) x[i]^2 - (sum (1..n) x[i])^2)/n  (for
       population)
    
       variance = (sum(1..n) x[i]^2 - (sum (1..n) x[i])^2)/(n-1)
       (for a sample of a population)
    
       S.D. = sqrt (variance)
    
652.3CLT::GILBERTeager like a childWed Jan 21 1987 04:122
    I seem to recall that the standard deviation in .1 *is* numerically
    stable.  The version in .2 can suffer from large round-off errors.
652.4COGITO::ROTHWed Jan 21 1987 12:094
    I also agree with .3, the version in .2 is sometimes convenient for
    analysis but can suffer a negative square root due to round off errors.

    - Jim
652.5One Pass Algorithm with Two Pass AccuracyVAXAGB::BELDINDick Beldin - 'Truth will Out'Mon May 04 1987 15:3346
    A common problem is to calculate the Standard Deviation from data
    stored in a file with only one pass and a static size of memory
    requirements.  The following algorithm provides accuracy equivalent to
    the two pass calculation (1st mean, 2nd mean-squared-deviation).
    
    Let n symbolize the number of observations already seen,
    
         x = the most recently read value from the file,
    
         mean = the arithmetic mean of all values read so far,
    
         sumsquares = the sum of squared deviations from the mean.
    
    Initialize the following (real) variables (in Pascal notation):
    
         mean := 0;
         
         sumsquares := 0;
         
         n := 0;
         
    Then with the following algorithm,
    
         begin
           Read_an_Observation(x);    
           n := n+1;    
           d := ( x - mean ) ) / n;
           mean := mean + d;
           sumsquares := sumsquares + (n-1) * d * d;
         end;
         
    After the last observation is processed, calculate 
    
         Population_Variance := sumsquares / n;
         
         Sample_Variance := sumsquares / (n-1);
         
    and
    
         the standard deviations are the square roots of the respective
         variances.
         

    This algorithm has been known for some twenty years.  I no longer have
    any references to it.  
652.6Single-pass calculation of quantilesSSDEVO::LARYWed May 06 1987 21:5412
In a similiar vein, there is an aricle in the October 1985 issue of
Communications of the ACM on a heuristic algorithm for calculating
arbitrary quantiles (a p-quantile of a distribution, 0<=p<=1, is the value
below which 100p percent of the distribution lies - the 0.5-quantile is
the median) with a single pass through the data and a very small amount
of working storage. It is claimed that this algorithm, run on a set of
samples of a distribution, produces an approximation of any quantile
essentially as good as the brute force approach (which involves partially
ordering the data, which takes as much memory as sorting it), provided the
distribution does not have a discontinuity near the desirred quantile.

One of the authors of the paper, Raj Jain, works for Digital.
652.7What Comprises The STD DEVIATION ?ADCSRV::RBROWNAre there no work houses ?Mon Jul 01 1991 13:1316
    Picking up on the standard deviation question.  We're looking at it as
    a representative of "confidance" for performance data.  That is, the
    closer the std deviation is to 0, the more likely our performance numbers 
    are to being what they should be.  The std deviation is overlayed on a
    series of bar charts.  Hence, if the std deviation is low, then the
    corresponding bar is probably more likly to be true.  A bar with a high
    std deviation may mean that the bar represents a large series of
    peaks/valleys, perhaps a run-away process, etc ...  This bar would be
    brought into question.
    
    Question though, what percentage of the overall data is represented by
    the std deviation ?  We beleive it to be 80%, that is 80% percent of
    the numbers will fall into the range being specified.  I've checked
    through several books and can't seem to find a figure for this.
    
    Thanks !
652.867% is a better (but not perfect) coverage ratePULPO::BELDIN_RMon Jul 01 1991 13:5720
    There is no simple answer.
    
    When the distribution is approximately normal, about 2/3 of the
    observations will be within one standard deviation of the mean.
    
    Any skew, excessive flattening or peakedness will distort this figure.
    
    You can see the impact by running the calculations with distribution
    functions given in most introductory texts on statistical and
    probability theory.
    
    Approximate normality is common where the deviations are many, as
    likely to be high as low, and small.  If there is a single very large
    deviation that dominates the randomness, normality will typically be
    violated.
    
    As long as you don't base any critical decisions on the 2/3 figure, it
    is a reasonable approximation for practical work.
    
    Dick
652.9VMSDEV::HALLYBThe Smart Money was on GoliathTue Jul 02 1991 00:2212
    You really should consider pitching some percentage of your datapoints
    as outliers.  In performance work especially, you get oddball timings
    that represent disk read errors or spurious datacomm path outages or...
    
    While the -frequency- of such outliers may be important, their -value-
    is almost surely unreliable and will tend to distort your other points.
    
    Also be sure of the distribution you are measuring.  Interarrival times,
    for example, are often exponential in nature and the standard deviation
    really isn't very helpful there, if you know what I "mean".
    
      John
652.10Is it normal?PAKORA::PFANGTue Jul 02 1991 07:3116
    You can get an idea of how close your data follows a normal (aka
    Gaussian) distribution by plotting it on a Normal Probability axis. If
    it falls approximately in a straight line, you get some confidence the
    data may be normally distributed. If the line has curvature to it, you
    may be dealing with a different distribution, for example exponential
    (as mentioned in the previous reply). If you get points at the end that
    don't fall on the line, you may have outliers (also mentioned
    previously).
    
    Do you really want the standard deviation, or do you want some kind of
    confidence interval for your data? The standard deviation has a direct
    interpretation if your data is normally distributed. But if you have
    another situation (not normal and/or outliers) then there are more
    `robust' measures of the variation of the data.
    
    Peter
652.12what am I missing?NOVA::FINNERTYlies, damned lies, and the CAPMWed Jul 27 1994 19:268
    
    re: expected value
    
        what does E(r) = .80 mean?  Do you mean that the average outcome
        over all possible outcomes is a 20% loss?  Sounds unattractive,
        to say the least!
    
    
652.13"30" is a private jokeletVMSDEV::HALLYBFish have no concept of fireWed Jul 27 1994 20:3115