[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1866.0. "Question on non-linear regression" by IOSG::CARLIN (Dick Carlin IOSG, Reading, England) Fri Apr 22 1994 18:09

    I hope you don't disapprove of me entering the following on behalf of
    my son. I'm afraid I haven't looked at it myself yet, but he is keen to
    get comments on it. I hope nothing got chopped in the conversion from
    Word to txt.
    
    
Data exists as recorded values X1,X2,X3,...,Xn
with associated values Y1,Y2,Y3,...,Yn. 

A power curve is to be fitted by rearranging the power relationship :
				y = t x^u
...to the linear relationship :
			        ln y = u ln x + ln t
...by taking natural logs on each side, then using least squares
(in a vertical residual direction) to obtain values of t and u.

Does minimising (E represents a sigma sign from i=1 to i=n) :
		E(ln Yi - u ln Xi - ln t)^2				-(I)
...by finding suitable values of t and u, imply that :
		E(Yi - t Xi^u)^2					-(II)
...is also minimised with these same values of t and u ?

(I) can be multiplied out to give :
R = u^2 E(ln Xi)^2 + E(ln Yi)^2 - 2u E(ln Xi ln Yi)
  + 2u ln t E(ln Xi) - 2ln t E(ln Yi) + n (ln t)^2

This gives :
'partial' dR/du = 2u E(ln Xi)^2 + 2ln t E(ln Xi) -2 E(ln Xi ln Yi)
'partial' dR/dt = (2u / t) E(ln Xi) - (2 / t) E(ln Yi) + (2n ln t / t)

It is known, and proved by algebraic methods, that :
   u = ( n E(ln Xi ln Yi) - E(ln Xi) E(ln Yi) )
       ----------------------------------------
          ( n E(ln Xi)^2 - ( E(ln Xi) )^2 )

ln t = ( E(ln Xi)^2 E(ln Yi) - E(ln Xi) E(ln Xi ln Yi) )
       -------------------------------------------------
              ( n E(ln Xi)^2 - ( E(ln Xi) )^2 )

(II) can be multiplied out to give :
S = t^2 E(Xi^2u) + E(Yi^2) - 2t E(Xi^u Yi)

This gives :
'partial' dS/du = 2 t^2 E(Xi^2u ln Xi) - 2t E(Xi^u Yi ln Xi)
'partial' dS/dt = 2t E(Xi^2u) - 2 E(Xi^u Yi)

Q1 : What are the solutions to 'partial' dS/du = 0 and 'partial' dS/dt = 0 ?

Q2 : If the previous solutions for t and u, obtained from (I),
     were substituted in 'partial' dS/du and 'partial' dS/dt
     would the two partial derivative equations both sum to zero ?

Q3 : The data to follow indicates that the answer to Q2 is "No".
     The sum of squares of the residuals for the curve y = t x^u
     best fitted (or seemingly best fitted) is GREATER than the
     sum of squares of the residuals for the simpler SUBSET of
     curves y = s x (i.e. u = 1, t = s).

     It may be that this is just a computer inaccuracy
     but it occurs at 16 s.f. spreadsheet working.

     Question 3 is "IS THE METHOD FOR CALCULATING THE BEST FIT
     POWER CURVE INACCURATE AND THEREFORE INCORRECTLY NAMED ???".
     This is the real question that requires answering - the rest
     is just preliminary 'garbage'.


DATA :  ( x , y ) (14,394) (17,371) (23,779) (26,1044) (27,661) (28,828)
                  (30,1701) (31,1251) (32,719) (33,886) (33,1010) (35,971)
                  (37,837) (42,1141) (43,1357)

[The erratic entry of (30,1701) may be omitted without altering the
trend in the results.  Data is from
"Differential Equations & Numerical Analysis" by Andrew Paterson, page 111.]
    
T.RTitleUserPersonal
Name
DateLines
1866.1If I understand your question...CADSYS::COOPERTopher CooperFri Apr 22 1994 19:5566
    Let me take a different tack --

    We can think of regression as a process of looking through a family of
    available curves -- distinguished from each other by some number of
    parameters -- and choosing a curve from the whole family which
    minimizes some penalty function.  In least-squares regression that
    penalty function is the sum of the squares of difference from our
    observed points.

    Think about one of those points, which will be contributing to the
    final cost function.  Assume it is at location (x0, y0) (both >0).
    Now think of two of our potential curves one of which has a value
    at x0 of y0+e, and the other of which has a value at x0 of y0-e, for
    "e" a small positive value.  The contribution to the sum of squares
    measure for both curves is equal, and is e^2.

    Now lets look at the situation after we have transformed our space by
    taking the log of both the x and the y coordinates.  The point has
    been transformed to be at (ln(x0), ln(y0)), while the transformed
    curves will pass through point ln(x0) at points ln(y0+e) and ln(y0-e).

    The contribution for that point, now, to the sum-of-squares score will
    be (ln(y0+e) - ln(y0))^2 for the first curve and (ln(y) - ln(y0-e))^2
    for the second.  If you look at the shape of the ln curve you will see
    that the lower curve will have a larger "error" value than the upper
    curve.  (To demonstrate this algebraically, note that the expansion of
    the first difference squared around e=0, is:

	  2      3         4       5
	 e      e      11 e      5e
	---- - ---- + ------- - ----- + ...
	  2      3         4       5
	 y      y      12 y      6y

     while the second one is:

	  2      3         4       5
	 e      e      11 e      5e
	---- + ---- + ------- + ----- + ...
	  2      3         4       5
	 y      y      12 y      6y

    The odd order terms, which are positive are added in the latter and
    subtracted in the former, so the latter is bigger than the former).

    This means that a least squares fitting in the transformed space will
    prefer, as far as this point is concerned, the upper curve to the lower
    even though in the untransformed space they are equal.  The linearized
    least square regression, therefore will not be equal to the non-linear
    least square  In fact, it will consistently run high.

    But...

    There is nothing magic about the least square criteria.  There are
    situations where it has actual strong justification, but most of the
    time it is simply used because it is analytically convenient.
    Depending on why you want the regression, your logrithmatized
    criterion, though a little hard to characterize, may be no more
    arbitrary and just as useful as a "vanila" least square.

    Furthermore, if you really want a strict least square, this procedure
    will generally get you a good first estimate for an iterative,
    non-linear least-square regression calculation.  I think that Numerical
    Recipies discusses such procedures.

				    Topher
1866.2When really justified.CADSYS::COOPERTopher CooperMon Apr 25 1994 21:1724
    Someone asked me via EMail just when the least-square criteria is
    really justified.

    It is justified when the dependent variable ("y") can reasonably be
    assumed to be the result of a sum of:

	1) A deterministic process parameterized by the precisely known
	   independent variables ("x's").

	2) A stochastic, normally distributed process (e.g., measurment
	   error) which is not dependent on the independent variables.

    There are variant versions of least square which allow imprecision in
    the independent variables and/or precisely characterized variation in
    the variance with position, but the essence is a normally distributed
    error on a deterministic process.

    This is a parameter estimation procedure and you still need a criterion
    for selecting what the best estimators are.  It turns out that virtually
    all the reasonable criteria, including the important "maximum
    likelihood" criteria, agree on least-square as the right measure given
    these conditions.

					    Topher