[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1185.0. "Average of pairs of numbers?" by NRPUR::CHABOT (Jerry Chabot) Fri Jan 26 1990 19:11

    
    I would appreciate some help on solving the following problem:
    
    Given "n" pairs of numbers (x,y) where x <= y
                                           x and y can be positive or
                                                   negative

    What would be the formula to calculate the average (X,Y)?
    
    For example:
                                     standard
          x      y     range   mean  deviation 
         -2     +3       5      .5      2.5
         -5     +10     15     2.5      7.5
         +2     +5       7     1.5      3.5
    
    What is the average X and Y?
    
    Can I do this?
    
      Average mean = sum (.5,2.5,1.5)/3 = 1.5   
      Average Deviation = sum (2.5, 7.5, 3.5)/3 = 4.5
    
      Back into X and Y =>   X = avg mean - avg dev = 1.5 - 4.5 = -3
                             Y = avg mean + avg dev = 1.5 + 4.5 = +6
    
      Therefore (X,Y) = (-3,6)
    
    
    Can you tell I'm not a statistics whiz?
    
    Jerry
    
T.RTitleUserPersonal
Name
DateLines
1185.1re .0ESCROW::MUNZERMon Jan 29 1990 11:4710
Jerry,

You're okay -- just a typo:

fixed>      Average mean = sum (.5,2.5,3.5)/3 = 2.17   
fixed>      Average Deviation = sum (2.5, 7.5, 1.5)/3 = 3.83

But why bother with means and differences -- why not just keep X's and Y's?

John
1185.2BEING::POSTPISCHILAlways mount a scratch monkey.Mon Jan 29 1990 11:5616
    Re .0:
    
    Why do you want an "average (X,Y)"?  (Telling us why you want it may
    help us figure out what answer would be most useful.)
    
    If you want an average X and an average Y, then you can just average
    the X's separately and average the Y's separately.
    
    If you think X and Y are related in some way, you might want to fit a
    curve to that relationship, then figure out an average X, and then
    compute the Y that matches that X according to the curve.  To tell you
    more about that, we'd need to know what kind of relationship you think
    there might be between X and Y. 
    
    
    				-- edp
1185.3UsageNRPUR::CHABOTJerry ChabotMon Jan 29 1990 12:1919
    
    X and Y are related. X represents the smallest number of a range
    and Y represents the largest.
    
    How it is to be applied:
    
    An event is scheduled to occur on date 3/10/90. In actuality, the
    event starts 3 days before (-3) and ends two days after (+2).
    
        x = -3   ,  y = +2
    
    This represents a five day window. NOTE: We need to clarify with
    the user whether it is five or six. 
    
    Given that many of these events can occur, I need to calculate the
    average window and corresponding smallest and largest values.
    
    I hope this helps in explaining how we intend to use the numbers.
    
1185.4Depends on assumptions.CADSYS::COOPERTopher CooperMon Jan 29 1990 17:2573
    What makes sense to calculate here really depends quite heavily on
    what assumptions you are willing to make, at least as approximations
    to the situation.
    
    First off, it is unlikely that the average standard deviation means
    anything.  The standard deviation is defined as the square-root of
    the variance.  It is useful because it has units of distance, but the
    underlying statistic is the variance.
    
    Think of a whole bunch of samples taken from some distribution
    (statistics people will excuse my "loose" language, I'm sure).  We
    know (by definition) that the "average position" will be the mean,
    but we also want to know, in some sense, how far a "typical" sample
    will be from the mean.  The direct average distance is useless since
    the negative distances will always cancel out the positive distances
    leaving us with zero.  The mean of the absolute values of the distances
    is more useful (and is called something like, the "absolute deviation")
    but is mathematically rather intractable in most cases.  Squaring the
    deviation gets everything on the same side of the mean, and generally
    results in more tractable mathematics than the absolute value, so this
    (the variance) is what is used.  Units of the variance are, however,
    the square of "distance" so in order to measure off a distance you
    must take the square root to get "distances", i.e., the standard
    deviation.
    
    The standard deviation -- because it is the square root of a
    linear quantity -- doesn't work well for doing things like finding the
    average.  The variance however, is an expectation and therefore linear
    -- E[(V1 + V2)/2] = E[V1 + V2]/2 = (E[V1] + E[V2])/2 = (V1 + V2)/2
    So if I was going to do something like you are trying to do I would
    use the variances rather than the standard deviations -- and then
    take the square root to get a new "standard deviation".
    
    But, secondly...
    
    I think that you are trying to do more than you have to.  If you make
    the following assumptions:
    
    	1) We measure two quantities: the start of an event (X) and the
    	   end of the event (Y).
    
    	2) There are underlying processes which cause each event to begin
    	   some amount of time early or late.
    
    	3) There are underlying processes which cause each event to take
    	   come amount of time.  These latter processes are independent
    	   of those in (2), i.e., knowing how much early/late the event
    	   began tells us nothing about how long it will go on, and vice
    	   versa.
    
    	4) Events are independent, that a particular event happens to be
    	   early or late by some amount does not cause later events to
    	   be early or late (or to take more or less time to complete).
    
    	5) The processes remain essentially the same over time: e.g.,
    	   there is no trend towards or away from longer events.
    
    If these assumptions hold (and a few more which say that the processes
    are "reasonably well behaved" processes) then a reasonable set of
    descriptive statistics would be:
    
    		(mean-X, mean-range)
    
    There seems little point in, in effect, converting the range into
    standard deviations or variances.
    
    Obviously, different assumptions (e.g., that processes have a well
    defined, but not directly observable, half-way point which the event
    tends to surround symetrically) would lead to different most-reasonable
    statistics (e.g., in the previous example, (mean-X-and-Y-midpoint,
    (mean-range)/2)).
    
    					Topher
1185.5BEING::POSTPISCHILAlways mount a scratch monkey.Tue Jan 30 1990 11:5032
    Re .3:
    
    If I understand correctly, you have a set of events, each of which is
    going to begin some number of days before 3/10/90 and end some number
    of days after 3/10/90.
    
    There isn't an "average window" because windows are two-dimensional
    instead of one -- they are not well-ordered.  We can average 1 and 3
    because 2 comes right between 1 and 3 in a straight line, but your
    windows go in different directions.
    
    One thing you can figure out is the average length of the windows.  For
    each window, take y-x to get the length of that window, and then
    average all the lengths.
    
    You can also figure out the center of all activity, in a sense.  Assume
    that every event has an equal amount of activity spread evenly out for
    its entire duration, around what time is the total amount of activity
    before that time equal to the total amount of activity after that time?
    
    To compute that, take the center of each window, (y+x)/2, and multiply
    by the length of the window, (y-x), to get (y^2-x^2)/2.  Then add up
    all of those products and divide by the total of the lengths of all the
    windows (the sums of y-x for each window).
    
    You could describe an "average window" to be a window of the average
    length with its center at the center of all activity.  Whether or not
    that's useful depends upon what you are going to do with the
    information. 
    
    
    				-- edp 
1185.6Fuzzy?AKQJ10::YARBROUGHI prefer PiTue Jan 30 1990 15:599
This sounds a bit like a fuzzy-logic problem, where there are not-well-
defined limits on some variable(s) and you want to know how they tend to 
behave. Take a look at some of the texts on fuzzy arithmetic, logic, and/or 
statistics in the DEC libraries and see if something clicks.

(Fuzzy logic deals with properties of data that are not sharply delineated, 
such as "tall", where there may be some difference of perception as to 
whether a given measurement has that property or not. If a man who is 5'4"
is not tall, and one 6'9" is tall, what is a man who is 5'11"?) 
1185.7NRPUR::CHABOTJerry ChabotThu Feb 01 1990 15:205
    
    Thanks for the inputs. I'll tried to digest them and figure out
    a solution.
    
    Jerry