| Jerry,
You're okay -- just a typo:
fixed> Average mean = sum (.5,2.5,3.5)/3 = 2.17
fixed> Average Deviation = sum (2.5, 7.5, 1.5)/3 = 3.83
But why bother with means and differences -- why not just keep X's and Y's?
John
|
| Re .0:
Why do you want an "average (X,Y)"? (Telling us why you want it may
help us figure out what answer would be most useful.)
If you want an average X and an average Y, then you can just average
the X's separately and average the Y's separately.
If you think X and Y are related in some way, you might want to fit a
curve to that relationship, then figure out an average X, and then
compute the Y that matches that X according to the curve. To tell you
more about that, we'd need to know what kind of relationship you think
there might be between X and Y.
-- edp
|
|
X and Y are related. X represents the smallest number of a range
and Y represents the largest.
How it is to be applied:
An event is scheduled to occur on date 3/10/90. In actuality, the
event starts 3 days before (-3) and ends two days after (+2).
x = -3 , y = +2
This represents a five day window. NOTE: We need to clarify with
the user whether it is five or six.
Given that many of these events can occur, I need to calculate the
average window and corresponding smallest and largest values.
I hope this helps in explaining how we intend to use the numbers.
|
| What makes sense to calculate here really depends quite heavily on
what assumptions you are willing to make, at least as approximations
to the situation.
First off, it is unlikely that the average standard deviation means
anything. The standard deviation is defined as the square-root of
the variance. It is useful because it has units of distance, but the
underlying statistic is the variance.
Think of a whole bunch of samples taken from some distribution
(statistics people will excuse my "loose" language, I'm sure). We
know (by definition) that the "average position" will be the mean,
but we also want to know, in some sense, how far a "typical" sample
will be from the mean. The direct average distance is useless since
the negative distances will always cancel out the positive distances
leaving us with zero. The mean of the absolute values of the distances
is more useful (and is called something like, the "absolute deviation")
but is mathematically rather intractable in most cases. Squaring the
deviation gets everything on the same side of the mean, and generally
results in more tractable mathematics than the absolute value, so this
(the variance) is what is used. Units of the variance are, however,
the square of "distance" so in order to measure off a distance you
must take the square root to get "distances", i.e., the standard
deviation.
The standard deviation -- because it is the square root of a
linear quantity -- doesn't work well for doing things like finding the
average. The variance however, is an expectation and therefore linear
-- E[(V1 + V2)/2] = E[V1 + V2]/2 = (E[V1] + E[V2])/2 = (V1 + V2)/2
So if I was going to do something like you are trying to do I would
use the variances rather than the standard deviations -- and then
take the square root to get a new "standard deviation".
But, secondly...
I think that you are trying to do more than you have to. If you make
the following assumptions:
1) We measure two quantities: the start of an event (X) and the
end of the event (Y).
2) There are underlying processes which cause each event to begin
some amount of time early or late.
3) There are underlying processes which cause each event to take
come amount of time. These latter processes are independent
of those in (2), i.e., knowing how much early/late the event
began tells us nothing about how long it will go on, and vice
versa.
4) Events are independent, that a particular event happens to be
early or late by some amount does not cause later events to
be early or late (or to take more or less time to complete).
5) The processes remain essentially the same over time: e.g.,
there is no trend towards or away from longer events.
If these assumptions hold (and a few more which say that the processes
are "reasonably well behaved" processes) then a reasonable set of
descriptive statistics would be:
(mean-X, mean-range)
There seems little point in, in effect, converting the range into
standard deviations or variances.
Obviously, different assumptions (e.g., that processes have a well
defined, but not directly observable, half-way point which the event
tends to surround symetrically) would lead to different most-reasonable
statistics (e.g., in the previous example, (mean-X-and-Y-midpoint,
(mean-range)/2)).
Topher
|
| Re .3:
If I understand correctly, you have a set of events, each of which is
going to begin some number of days before 3/10/90 and end some number
of days after 3/10/90.
There isn't an "average window" because windows are two-dimensional
instead of one -- they are not well-ordered. We can average 1 and 3
because 2 comes right between 1 and 3 in a straight line, but your
windows go in different directions.
One thing you can figure out is the average length of the windows. For
each window, take y-x to get the length of that window, and then
average all the lengths.
You can also figure out the center of all activity, in a sense. Assume
that every event has an equal amount of activity spread evenly out for
its entire duration, around what time is the total amount of activity
before that time equal to the total amount of activity after that time?
To compute that, take the center of each window, (y+x)/2, and multiply
by the length of the window, (y-x), to get (y^2-x^2)/2. Then add up
all of those products and divide by the total of the lengths of all the
windows (the sums of y-x for each window).
You could describe an "average window" to be a window of the average
length with its center at the center of all activity. Whether or not
that's useful depends upon what you are going to do with the
information.
-- edp
|
| This sounds a bit like a fuzzy-logic problem, where there are not-well-
defined limits on some variable(s) and you want to know how they tend to
behave. Take a look at some of the texts on fuzzy arithmetic, logic, and/or
statistics in the DEC libraries and see if something clicks.
(Fuzzy logic deals with properties of data that are not sharply delineated,
such as "tall", where there may be some difference of perception as to
whether a given measurement has that property or not. If a man who is 5'4"
is not tall, and one 6'9" is tall, what is a man who is 5'11"?)
|