[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1185.0. "Average of pairs of numbers?" by NRPUR::CHABOT (Jerry Chabot) Fri Jan 26 1990 19:11

    
    I would appreciate some help on solving the following problem:
    
    Given "n" pairs of numbers (x,y) where x <= y
                                           x and y can be positive or
                                                   negative

    What would be the formula to calculate the average (X,Y)?
    
    For example:
                                     standard
          x      y     range   mean  deviation 
         -2     +3       5      .5      2.5
         -5     +10     15     2.5      7.5
         +2     +5       7     1.5      3.5
    
    What is the average X and Y?
    
    Can I do this?
    
      Average mean = sum (.5,2.5,1.5)/3 = 1.5   
      Average Deviation = sum (2.5, 7.5, 3.5)/3 = 4.5
    
      Back into X and Y =>   X = avg mean - avg dev = 1.5 - 4.5 = -3
                             Y = avg mean + avg dev = 1.5 + 4.5 = +6
    
      Therefore (X,Y) = (-3,6)
    
    
    Can you tell I'm not a statistics whiz?
    
    Jerry

T.R	Title	User	Personal Name	Date	Lines
1185.1	re .0	ESCROW::MUNZER		`Mon Jan 29 1990 11:47`	10
	Jerry, You're okay -- just a typo: fixed> Average mean = sum (.5,2.5,3.5)/3 = 2.17 fixed> Average Deviation = sum (2.5, 7.5, 1.5)/3 = 3.83 But why bother with means and differences -- why not just keep X's and Y's? John
1185.2		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Mon Jan 29 1990 11:56`	16
	Re .0: Why do you want an "average (X,Y)"? (Telling us why you want it may help us figure out what answer would be most useful.) If you want an average X and an average Y, then you can just average the X's separately and average the Y's separately. If you think X and Y are related in some way, you might want to fit a curve to that relationship, then figure out an average X, and then compute the Y that matches that X according to the curve. To tell you more about that, we'd need to know what kind of relationship you think there might be between X and Y. -- edp
1185.3	Usage	NRPUR::CHABOT	Jerry Chabot	`Mon Jan 29 1990 12:19`	19
	X and Y are related. X represents the smallest number of a range and Y represents the largest. How it is to be applied: An event is scheduled to occur on date 3/10/90. In actuality, the event starts 3 days before (-3) and ends two days after (+2). x = -3 , y = +2 This represents a five day window. NOTE: We need to clarify with the user whether it is five or six. Given that many of these events can occur, I need to calculate the average window and corresponding smallest and largest values. I hope this helps in explaining how we intend to use the numbers.
1185.4	Depends on assumptions.	CADSYS::COOPER	Topher Cooper	`Mon Jan 29 1990 17:25`	73
	What makes sense to calculate here really depends quite heavily on what assumptions you are willing to make, at least as approximations to the situation. First off, it is unlikely that the average standard deviation means anything. The standard deviation is defined as the square-root of the variance. It is useful because it has units of distance, but the underlying statistic is the variance. Think of a whole bunch of samples taken from some distribution (statistics people will excuse my "loose" language, I'm sure). We know (by definition) that the "average position" will be the mean, but we also want to know, in some sense, how far a "typical" sample will be from the mean. The direct average distance is useless since the negative distances will always cancel out the positive distances leaving us with zero. The mean of the absolute values of the distances is more useful (and is called something like, the "absolute deviation") but is mathematically rather intractable in most cases. Squaring the deviation gets everything on the same side of the mean, and generally results in more tractable mathematics than the absolute value, so this (the variance) is what is used. Units of the variance are, however, the square of "distance" so in order to measure off a distance you must take the square root to get "distances", i.e., the standard deviation. The standard deviation -- because it is the square root of a linear quantity -- doesn't work well for doing things like finding the average. The variance however, is an expectation and therefore linear -- E[(V1 + V2)/2] = E[V1 + V2]/2 = (E[V1] + E[V2])/2 = (V1 + V2)/2 So if I was going to do something like you are trying to do I would use the variances rather than the standard deviations -- and then take the square root to get a new "standard deviation". But, secondly... I think that you are trying to do more than you have to. If you make the following assumptions: 1) We measure two quantities: the start of an event (X) and the end of the event (Y). 2) There are underlying processes which cause each event to begin some amount of time early or late. 3) There are underlying processes which cause each event to take come amount of time. These latter processes are independent of those in (2), i.e., knowing how much early/late the event began tells us nothing about how long it will go on, and vice versa. 4) Events are independent, that a particular event happens to be early or late by some amount does not cause later events to be early or late (or to take more or less time to complete). 5) The processes remain essentially the same over time: e.g., there is no trend towards or away from longer events. If these assumptions hold (and a few more which say that the processes are "reasonably well behaved" processes) then a reasonable set of descriptive statistics would be: (mean-X, mean-range) There seems little point in, in effect, converting the range into standard deviations or variances. Obviously, different assumptions (e.g., that processes have a well defined, but not directly observable, half-way point which the event tends to surround symetrically) would lead to different most-reasonable statistics (e.g., in the previous example, (mean-X-and-Y-midpoint, (mean-range)/2)). Topher
1185.5		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Tue Jan 30 1990 11:50`	32
	Re .3: If I understand correctly, you have a set of events, each of which is going to begin some number of days before 3/10/90 and end some number of days after 3/10/90. There isn't an "average window" because windows are two-dimensional instead of one -- they are not well-ordered. We can average 1 and 3 because 2 comes right between 1 and 3 in a straight line, but your windows go in different directions. One thing you can figure out is the average length of the windows. For each window, take y-x to get the length of that window, and then average all the lengths. You can also figure out the center of all activity, in a sense. Assume that every event has an equal amount of activity spread evenly out for its entire duration, around what time is the total amount of activity before that time equal to the total amount of activity after that time? To compute that, take the center of each window, (y+x)/2, and multiply by the length of the window, (y-x), to get (y^2-x^2)/2. Then add up all of those products and divide by the total of the lengths of all the windows (the sums of y-x for each window). You could describe an "average window" to be a window of the average length with its center at the center of all activity. Whether or not that's useful depends upon what you are going to do with the information. -- edp
1185.6	Fuzzy?	AKQJ10::YARBROUGH	I prefer Pi	`Tue Jan 30 1990 15:59`	9
	This sounds a bit like a fuzzy-logic problem, where there are not-well- defined limits on some variable(s) and you want to know how they tend to behave. Take a look at some of the texts on fuzzy arithmetic, logic, and/or statistics in the DEC libraries and see if something clicks. (Fuzzy logic deals with properties of data that are not sharply delineated, such as "tall", where there may be some difference of perception as to whether a given measurement has that property or not. If a man who is 5'4" is not tall, and one 6'9" is tall, what is a man who is 5'11"?)
1185.7		NRPUR::CHABOT	Jerry Chabot	`Thu Feb 01 1990 15:20`	5
	Thanks for the inputs. I'll tried to digest them and figure out a solution. Jerry