[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

2078.0. "Need corr coeff for > linear approx" by SMARTT::DGAUTHIER () Thu Jan 23 1997 17:32

T.R	Title	User	Personal Name	Date	Lines
2078.1	transform and partition	CPEEDY::BRADLEY	Chuck Bradley	`Thu Jan 23 1997 20:52`	20
2078.2		AUSS::GARSON	DECcharity Program Office	`Thu Jan 23 1997 23:31`	19
2078.3	modeling	TPOVC::BUCHANAN	the rolling stone catches the worm	`Fri Jan 24 1997 00:44`	24
2078.4	What's it for?	CHEFS::STRANGEWAYS	Andy Strangeways@REO DTN 830-3216	`Fri Jan 24 1997 07:58`	39
	Dave, I would support everything that has been written in the previous replies. I'd also like to add yet another viewpoint on the same thing: Why do you want to know the correlation coefficient? Yes, it's an interesting bit of math to work this out, but I doubt you're calculating it simply because correlation coefficents are so cute. Are you trying to find evidence to support a hypothesis? (e.g. "I believe that the level of input X will have no effect on output Y until it reaches a threshold, at which point Y will rise to a maximum. Do the observed results support this belief?) Are you monitoring a process to ensure error/variablity does not get out of hand? (e.g. The machine should normally set the value of Y to within +/- delta according to the value of X. Do the observed measurements deviate by significantly more than this?) Are you trying to estimate a parameter from a known family of distributions? (e.g. I know it's a step function, where's the step?) Good statistical techniques exists to address each of these problems. The linear correlation coefficient may be used in similar tests in the linear case. You'll need different statistics if your underlying model is not linear. If you post or mail me your model and your objective, I can give you some help and some pointers. (My wife had to do some hypothesis testing and estimation of paramaters for functions of the form Y = A + B/ln(kX). One thing I discovered in solving this was that transforming the data to enable a standard linear test to be used doesn't work. The assumption of normal distribution of "errors" is completely invalid after the transform.) Andy.
2078.5		CNTROL::DGAUTHIER		`Fri Jan 24 1997 14:42`	46
	Well, here's the situation.... We work in the semiconductor fabrication space in Hudson Mass. The chips we build (including all the Alphas)are built up one step at a time. The devices (transistors) and other structures are tested and measured throughout the fabrication process. A typical number of measurments in this space would be 286. After fabrication, the finished part may be tested as many as 4 different times where up to another hundred or so test results are added to the pool. So you've got around 300-400 different variables. The bottom line is to make as many good parts as possible. But, life (and the industry) being what it is, many parts ultimately test out as being bad. The IC testers tell us the test(s) that failed and give us measured values for those tests. The reasons why these results are what they are may be indicated by what went on earlier in the fabrication process. And that's a fist cut at it. The test that failed may be due to fact that some other test failed which in turn maps back to the fabrication process. And fabrication test results may be (and often are) related to each other. And finally, modeling the relationship between 2 variables might be good for today's parts, differe slightly for tomorrow's and be way off for the next day's. Why? Because of the effects of some other variable. And then there are variables which impact all of this which do not even appear in the set described above! IOW, there are many variables, their interrelationships are known in some cases, not so well known in others and can vary from time to tims. When someone tried to solve a problem, they embark on an investigation using the total set of data. Experience guides them in certain directions. Visual inspection of X/Y plots is sometimes used to "see" if there's a relationship between variables. Other techniques are used, including, looking at linear correlation coefficients. But low coefficients do not necessarily mean there's no relationship, so they have to be used with caution and skepticism. What I'm looking for is a means to test the relatedness of any two variables for a subset of the data (one week's data e.g.). Actually, something like a correlation matrix is nice because you can get a handle on many relationships at a glance. Transforming the data doesn't seem like a good possibility because you may not know how to transfom it before you begin. The piecewise approach might work. I'll consider how to implement that in SAS. Thanks for the suggestion! -dave
2078.6	Rank Correlation might be a safe alternative.	YIELD::FANG		`Wed Jan 29 1997 20:20`	17
	Dave, One method that might work for a lot of what you're trying to capture, is to do a rank correlation. This method essentially assigns a rank value to all the x-values, then a rank value to all the y-values. You're scatter plot won't look the same, but your correlation coefficient might give you a better shot at finding some of the correlations you're looking for. The rank correlation would have a couple advantages over the linear correlation. - It won't be too sensitive to outliers - It should do a fair job at finding any monotonic relationships If you're also looking for really quadratic relationships, e.g., a quadratic loss function with a upside-down bell, then you may want to do a rank correlation on the square of the x-variable. -Peter
2078.7		HPCGRP::MANLEY		`Fri Feb 07 1997 15:05`	11
	Just curious. Are good part profiles availables? It sounds like testing data are save at each step, until a part is finally accepted. What happens to test data when a part is accepted? Is it used to develop a "good part" profile database? Is it saved for later reference, should the part fail in the field? Is factor analysis used to find the most significant factors leading to failures as well as interactions between factors that lead to failures?