[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

977.0. "Curve fitting problem" by EST::JARGENTATI () Wed Nov 16 1988 12:11

         Hello,

         I was wondering if any of the readers of this conference
         would have some ideas on the types of functions best suited
         for a curve fit to a NON-DECREASING INTEGRAL data set. The
         problem that I'm running into is that the data has lots of
         flat spots which makes things more difficult. All comments,
         suggestions, or references are welcome.

         Restrictions:

         1. There MUST be a way to reconstruct the original data points 
            (integers) from the function.

            i.e. data(i) = round( f(i) ) 

            The behavior of the function between the data points does 
            not matter.  

         2. The function must be somewhat easy to express, i.e. A one
            one hundred term polynomial is NOT desirable.  I'd prefer
            a function that has < 10 coefficients, exponents, and
            constants.  That is, the number of coefficients + the number
            of exponents + a constant is less than 10 or so.

         Here's an example data set and a plot of it.
          i      :  0  1  2  3  4  5  6  7  8  9
          DATA(i):  0  1  1  4  5  5  5  6  7  8


                         9
                         8                   *
                         7                 *
                         6               *
                 DATA(i) 5         * * *
                         4       * 
                         3
                         2
                         1   * *       
                         0 *       
                           0 1 2 3 4 5 6 7 8 9 
                                    i

         Find a function that fits this data such that:
             1. data(i) = round(f(i))
             2. f(i) is not too messy (I know this is vague!).

         Note that in reality the data set will be much larger,
         perhaps n > 500.

         Thanks in advance for your input.
         Sincerely,
         John Argentati

T.R	Title	User	Personal Name	Date	Lines
977.1	Let's solve something else - this one's too messy	AKQJ10::YARBROUGH	I prefer Pi	`Wed Nov 16 1988 12:32`	14
	Your only hope here appears to be in segmenting the function so that each segment can be fitted with a low-order polynomial. The 'flat spots', or regions with constant derivative, suggest that an overall polynomial fit will have wild ups and downs between points. In other words, if you fit a continuous differentiable function Y through 500 points at random, there probably exists an x-value for which Y(x) is substantially larger in absolute value than any of the input data points. However, we are probably already looking too closely at the details of (one possible) implementation before really knowing what problem you are trying to solve. If you were to describe that, we might be able to see an approach that completely avoids the issue you raised. Lynn Yarbrough
977.2		HPSTEK::XIA		`Wed Nov 16 1988 14:52`	8
	re. 0 SPLINE should be a good way of doing it. As .1 said, if you use high order polynomials, there is a danger of large fluctuation between data points. On the other hand, if you are sure intermediate points do not matter, then.... Check any numerical analysis book on how to do SPLINE. Eugene
977.3		DWOVAX::YOUNG	Note early. Note Often.	`Thu Nov 17 1988 03:46`	7
	Re .2: Splines are too messy. An exact fit spline for a 500 data point set would probably be even more complex, in number of terms and segments, than a polynomial. -- Barry
977.4		HPSTEK::XIA		`Thu Nov 17 1988 04:31`	10
	re .2 .3 Well, if as .1 said that 500 points are already fine enough, why not just link these points with straight lines. That will be easy enough. On the other hand 500 is not unmanagable by SPLINE. What it amounts to is solving a tri-diagnal matrix (maybe even simpler than that. I have not touch the stuff for two years. So....) of 500X500 which shouldn't be too bad if you have an efficient matrix package. Eugene
977.5		HPSTEK::XIA		`Thu Nov 17 1988 04:34`	5
	re .2 An important point I want to emphasize is that a polynomial of degree 499 is unsafe. Eugene
977.6		AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Thu Nov 17 1988 21:43`	120
	I remember a calendar day-of-the-week algorithm that came up with a good representation for the running total of the number of days in a month, modulo seven. This was clever, but having only twelve data points must have helped a lot. The calendar stuff follows. It is interesting but won't help for five hundred data points. Dan I saw something like this used to encode an almost linear function used in a day-of-the-week algorithm. Our calendar has a period of 400 years. Leap year occurs in years divisible by 4, unless also divisible by 100 when there is no leap year, unless also divisible by 400 when there is. So every 400 years, the leap year cycle repeats. It so happens that these 400 years have a number of days which is an exact multiple of 7. So after 400 years the next leap year cycle starts on the same day of the week as the first previous cycle. This led someone to develop a day of the week formula. So number the days of the modulo 7, with Sunday = 0 up to Saturday = 6. Let y, m, and d be the year, month, and day numbers. So for today, Thursday November 17, 1988, we have y = 1988, m = 11, d = 17, and the day of the week is 4. It is easiest to handle leap day as the last day of the year, so we slightly massage the y, m, d inputs by treating March as the first month, and February as the twelfth month of the previous year. So the first step in the formula is to make this transformation from y, m, d to Y, M, D: { y - 1 if m = 1 or m = 2 Y = { { y if 3 <= m <= 12 { M + 10 if m = 1 or m = 2 M = { { M - 2 if 3 <= m <= 12 D = d Now the day of the week number will be the number of days past since some reference date added to the day of the week number of that reference date, modulo 7. The contribution of Y to the sum will be 365 = 1 mod 7 days per year, plus 1 for every leap year, for a total of Y + [Y/4] - [Y/100] + [Y/400] Here the notation [x] means the greatest integer <= x. The contribution for the day will just be D. Now here is where the "almost linear" function stuff starts, in computing the contribution for the adjusted month number M. For M=1 (March) the contribution will be 0; for M=2 the contribution will be the number of days in March, modulo 7; for M=3 the contribution will be the number of days in March and April, modulo 7; etc. Now write this out, for each next value of M adding the number of days of the previous month modulo 7, but not reducing 7's from the sum: M days days modulo 7 sum -- ---- ------------- --- 1 0 0 0 2 31 3 3 3 30 2 5 4 31 3 8 5 30 2 10 6 31 3 13 7 31 3 16 8 30 2 18 9 31 3 21 10 30 2 23 11 31 3 26 12 31 3 29 or M 1 2 3 4 5 6 7 8 9 10 11 12 f(M) 0 3 5 8 10 13 16 18 21 23 26 29 The problem was to express this function in an easy to compute way. The solution, and I don't know who to credit this to, is to use: f(M) = [2.6 M - 2.1] = [2.6M - 2.2] I had not memorized this, I just worked it out again, it isn't too hard for this particular example. By the way, in most programming languages, use truncating integer arithmetic and let f(M) = (26 * M - 21)/10 or (13 * M - 11)/5 So the overall formula becomes Y + [Y/4] - [Y/100] + [Y/400] + [2.6M - 2.1] + D + k where k is the constant that gives the correct value for the initial day. For today, with expected result 4, we get y, m, d = 1988, 11, 17 Y, M, D = 1988, 9, 17 4 = 1988 + [1988/4] - [1988/100] + [1988/400] + [2.6 * 9 - 2.2] + 17 + k (all that modulo 7, of course) or k = 2. The [... - 2.1] and + 2 can be combined to give >> f(Y, M, D) = Y + [Y/4] - [Y/100] + [Y/400] + [2.6M - 0.1] + D You can precompute the result of the Y terms and remember them for the next year or two. So, for example, f(1988, M, D) = [2.6M - 0.1] + D + 6 modulo 7 f(1989, M, D) = [2.6M - 0.1] + D modulo 7 Of course, don't forget the transformation from y,m -> Y,M. Dan