| Let me take a different tack --
We can think of regression as a process of looking through a family of
available curves -- distinguished from each other by some number of
parameters -- and choosing a curve from the whole family which
minimizes some penalty function. In least-squares regression that
penalty function is the sum of the squares of difference from our
observed points.
Think about one of those points, which will be contributing to the
final cost function. Assume it is at location (x0, y0) (both >0).
Now think of two of our potential curves one of which has a value
at x0 of y0+e, and the other of which has a value at x0 of y0-e, for
"e" a small positive value. The contribution to the sum of squares
measure for both curves is equal, and is e^2.
Now lets look at the situation after we have transformed our space by
taking the log of both the x and the y coordinates. The point has
been transformed to be at (ln(x0), ln(y0)), while the transformed
curves will pass through point ln(x0) at points ln(y0+e) and ln(y0-e).
The contribution for that point, now, to the sum-of-squares score will
be (ln(y0+e) - ln(y0))^2 for the first curve and (ln(y) - ln(y0-e))^2
for the second. If you look at the shape of the ln curve you will see
that the lower curve will have a larger "error" value than the upper
curve. (To demonstrate this algebraically, note that the expansion of
the first difference squared around e=0, is:
2 3 4 5
e e 11 e 5e
---- - ---- + ------- - ----- + ...
2 3 4 5
y y 12 y 6y
while the second one is:
2 3 4 5
e e 11 e 5e
---- + ---- + ------- + ----- + ...
2 3 4 5
y y 12 y 6y
The odd order terms, which are positive are added in the latter and
subtracted in the former, so the latter is bigger than the former).
This means that a least squares fitting in the transformed space will
prefer, as far as this point is concerned, the upper curve to the lower
even though in the untransformed space they are equal. The linearized
least square regression, therefore will not be equal to the non-linear
least square In fact, it will consistently run high.
But...
There is nothing magic about the least square criteria. There are
situations where it has actual strong justification, but most of the
time it is simply used because it is analytically convenient.
Depending on why you want the regression, your logrithmatized
criterion, though a little hard to characterize, may be no more
arbitrary and just as useful as a "vanila" least square.
Furthermore, if you really want a strict least square, this procedure
will generally get you a good first estimate for an iterative,
non-linear least-square regression calculation. I think that Numerical
Recipies discusses such procedures.
Topher
|
| Someone asked me via EMail just when the least-square criteria is
really justified.
It is justified when the dependent variable ("y") can reasonably be
assumed to be the result of a sum of:
1) A deterministic process parameterized by the precisely known
independent variables ("x's").
2) A stochastic, normally distributed process (e.g., measurment
error) which is not dependent on the independent variables.
There are variant versions of least square which allow imprecision in
the independent variables and/or precisely characterized variation in
the variance with position, but the essence is a normally distributed
error on a deterministic process.
This is a parameter estimation procedure and you still need a criterion
for selecting what the best estimators are. It turns out that virtually
all the reasonable criteria, including the important "maximum
likelihood" criteria, agree on least-square as the right measure given
these conditions.
Topher
|