| Dave,
I would support everything that has been written in the previous
replies.
I'd also like to add yet another viewpoint on the same thing:
Why do you want to know the correlation coefficient? Yes, it's an
interesting bit of math to work this out, but I doubt you're
calculating it simply because correlation coefficents are so cute.
Are you trying to find evidence to support a hypothesis? (e.g. "I
believe that the level of input X will have no effect on output Y until
it reaches a threshold, at which point Y will rise to a maximum. Do the
observed results support this belief?)
Are you monitoring a process to ensure error/variablity does not get
out of hand? (e.g. The machine should normally set the value of Y to
within +/- delta according to the value of X. Do the observed
measurements deviate by significantly more than this?)
Are you trying to estimate a parameter from a known family of
distributions? (e.g. I know it's a step function, where's the step?)
Good statistical techniques exists to address each of these problems.
The linear correlation coefficient may be used in similar tests in the
linear case. You'll need different statistics if your underlying model
is not linear.
If you post or mail me your model and your objective, I can give you
some help and some pointers.
(My wife had to do some hypothesis testing and estimation of paramaters
for functions of the form Y = A + B/ln(kX). One thing I discovered in
solving this was that transforming the data to enable a standard linear
test to be used doesn't work. The assumption of normal distribution of
"errors" is completely invalid after the transform.)
Andy.
|
| Well, here's the situation....
We work in the semiconductor fabrication space in Hudson Mass. The
chips we build (including all the Alphas)are built up one step at a
time. The devices (transistors) and other structures are tested and
measured throughout the fabrication process. A typical number of
measurments in this space would be 286. After fabrication, the
finished part may be tested as many as 4 different times where up to
another hundred or so test results are added to the pool. So you've
got around 300-400 different variables.
The bottom line is to make as many good parts as possible. But, life
(and the industry) being what it is, many parts ultimately test out as
being bad. The IC testers tell us the test(s) that failed and give us
measured values for those tests. The reasons why these results are
what they are may be indicated by what went on earlier in the
fabrication process. And that's a fist cut at it. The test that
failed may be due to fact that some other test failed which in turn
maps back to the fabrication process. And fabrication test results
may be (and often are) related to each other. And finally, modeling
the relationship between 2 variables might be good for today's parts,
differe slightly for tomorrow's and be way off for the next day's.
Why? Because of the effects of some other variable. And then there
are variables which impact all of this which do not even appear in the
set described above!
IOW, there are many variables, their interrelationships are known in
some cases, not so well known in others and can vary from time to tims.
When someone tried to solve a problem, they embark on an investigation
using the total set of data. Experience guides them in certain
directions. Visual inspection of X/Y plots is sometimes used to "see"
if there's a relationship between variables. Other techniques are
used, including, looking at linear correlation coefficients. But low
coefficients do not necessarily mean there's no relationship, so they
have to be used with caution and skepticism.
What I'm looking for is a means to test the relatedness of any two
variables for a subset of the data (one week's data e.g.). Actually,
something like a correlation matrix is nice because you can get a
handle on many relationships at a glance. Transforming the data
doesn't seem like a good possibility because you may not know how to
transfom it before you begin. The piecewise approach might work. I'll
consider how to implement that in SAS. Thanks for the suggestion!
-dave
|
| Dave,
One method that might work for a lot of what you're trying to capture,
is to do a rank correlation. This method essentially assigns a rank
value to all the x-values, then a rank value to all the y-values.
You're scatter plot won't look the same, but your correlation
coefficient might give you a better shot at finding some of the
correlations you're looking for. The rank correlation would have a
couple advantages over the linear correlation.
- It won't be too sensitive to outliers
- It should do a fair job at finding any monotonic relationships
If you're also looking for really quadratic relationships, e.g., a
quadratic loss function with a upside-down bell, then you may want to
do a rank correlation on the square of the x-variable.
-Peter
|
|
Just curious.
Are good part profiles availables? It sounds like testing data
are save at each step, until a part is finally accepted. What
happens to test data when a part is accepted? Is it used to develop
a "good part" profile database? Is it saved for later reference,
should the part fail in the field? Is factor analysis used to
find the most significant factors leading to failures as well as
interactions between factors that lead to failures?
|