[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

2078.0. "Need corr coeff for > linear approx" by SMARTT::DGAUTHIER () Thu Jan 23 1997 17:32

T.RTitleUserPersonal
Name
DateLines
2078.1transform and partitionCPEEDY::BRADLEYChuck BradleyThu Jan 23 1997 20:5220
2078.2AUSS::GARSONDECcharity Program OfficeThu Jan 23 1997 23:3119
2078.3modelingTPOVC::BUCHANANthe rolling stone catches the wormFri Jan 24 1997 00:4424
2078.4What's it for?CHEFS::STRANGEWAYSAndy Strangeways@REO DTN 830-3216Fri Jan 24 1997 07:5839
    Dave,
    
    I would support everything that has been written in the previous
    replies.
    
    I'd also like to add yet another viewpoint on the same thing:
    
    Why do you want to know the correlation coefficient? Yes, it's an
    interesting bit of math to work this out, but I doubt you're
    calculating it simply because correlation coefficents are so cute.
    
    Are you trying to find evidence to support a hypothesis? (e.g. "I
    believe that the level of input X will have no effect on output Y until
    it reaches a threshold, at which point Y will rise to a maximum. Do the
    observed results support this belief?)
    
    Are you monitoring a process to ensure error/variablity does not get
    out of hand? (e.g. The machine should normally set the value of Y to
    within +/- delta according to the value of X. Do the observed
    measurements deviate by significantly more than this?)
    
    Are you trying to estimate a parameter from a known family of
    distributions? (e.g. I know it's a step function, where's the step?)
    
    Good statistical techniques exists to address each of these problems.
    The linear correlation coefficient may be used in similar tests in the
    linear case. You'll need different statistics if your underlying model
    is not linear.
    
    If you post or mail me your model and your objective, I can give you
    some help and some pointers.
    
    (My wife had to do some hypothesis testing and estimation of paramaters
    for functions of the form Y = A + B/ln(kX). One thing I discovered in
    solving this was that transforming the data to enable a standard linear
    test to be used doesn't work. The assumption of normal distribution of
    "errors" is completely invalid after the transform.)
    
    Andy.
2078.5CNTROL::DGAUTHIERFri Jan 24 1997 14:4246
    Well, here's the situation....
    
    We work in the semiconductor fabrication space in Hudson Mass.  The
    chips we build (including all the Alphas)are built up one step at a
    time.  The devices (transistors) and other structures are tested and
    measured throughout the fabrication process.  A typical number of
    measurments in this space would be 286.  After fabrication, the
    finished part may be tested as many as 4 different times where up to
    another hundred or so test results are added to the pool.  So you've
    got around 300-400 different variables.  
    
    The bottom line is to make as many good parts as possible.  But, life
    (and the industry) being what it is, many parts ultimately test out as
    being bad.  The IC testers tell us the test(s) that failed and give us
    measured values for those tests.  The reasons why these results are
    what they are may be indicated by what went on earlier in the
    fabrication process.  And that's a fist cut at it.  The test that
    failed may be due to fact that some other test failed which in turn
    maps back to the fabrication process.  And fabrication test results
    may be (and often are) related to each other.  And finally, modeling
    the relationship between 2 variables might be good for today's parts,
    differe slightly for tomorrow's and be way off for the next day's. 
    Why?  Because of the effects of some other variable.  And then there 
    are variables which impact all of this which do not even appear in the
    set described above!
    
    IOW, there are many variables, their interrelationships are known in
    some cases, not so well known in others and can vary from time to tims.
    When someone tried to solve a problem, they embark on an investigation
    using the total set of data.  Experience guides them in certain
    directions.  Visual inspection of X/Y plots is sometimes used to "see" 
    if there's a relationship between variables.  Other techniques are
    used, including, looking at linear correlation coefficients.  But low
    coefficients do not necessarily mean there's no relationship, so they
    have to be used with caution and skepticism.
    
    What I'm looking for is a means to test the relatedness of any two
    variables for a subset of the data (one week's data e.g.).  Actually,
    something like a correlation matrix is nice because you can get a 
    handle on many relationships at a glance.  Transforming the data
    doesn't seem like a good possibility because you may not know how to
    transfom it before you begin.  The piecewise approach might work.  I'll
    consider how to implement that in SAS.  Thanks for the suggestion!
    
    -dave
    
2078.6Rank Correlation might be a safe alternative.YIELD::FANGWed Jan 29 1997 20:2017
    Dave,
    
    One method that might work for a lot of what you're trying to capture,
    is to do a rank correlation. This method essentially assigns a rank
    value to all the x-values, then a rank value to all the y-values.
    You're scatter plot won't look the same, but your correlation
    coefficient might give you a better shot at finding some of the
    correlations you're looking for. The rank correlation would have a
    couple advantages over the linear correlation.
    	- It won't be too sensitive to outliers
    	- It should do a fair job at finding any monotonic relationships
    
    If you're also looking for really quadratic relationships, e.g., a
    quadratic loss function with a upside-down bell, then you may want to
    do a rank correlation on the square of the x-variable.
    
    -Peter
2078.7HPCGRP::MANLEYFri Feb 07 1997 15:0511
	Just curious.

	Are good part profiles availables? It sounds like testing data
        are save at each step, until a part is finally accepted. What
	happens to test data when a part is accepted? Is it used to develop 
	a "good part" profile database? Is it saved for later reference,
	should the part fail in the field? Is factor analysis used to
	find the most significant factors leading to failures as well as
	interactions between factors that lead to failures?