Resampling Stats

Order
order online
printed form

Software
Excel
Matlab
XLMiner
Downloads

Books, etc. Intro text online
Articles
Bibliographies

Courses Internet course

Teaching Teaching With RS
Teaching Information
What Students Say
What Teachers Say
What Reviewers Say
What Authors Say

Support User Guides
Troubleshooting

About Contact Mailing List

 
About Resampling



| Basic Commands | Probability Puzzles | Hypothesis Test, Count Data | Hypothesis Test, Measured Data | Confidence Interval, Count Data | Confidence Interval, Measured Data | Association / Correlation | Regression | Other Examples |

Cheese

Problem

A food company experimented with different levels of salt and fat in a cheese product (measured from a baseline that we will call "0 salt, 0 fat"). There were 13 different trials resulting in 13 observations (5 of which were all at the baseline level). Data on the levels of salt and fat and on resulting consumer acceptance are in file "cheese.dat" and as follows:

Cheese Table. Consumer Acceptance of Cheese by Salt and Fat Levels
Salt levels Fat levels Consumer acceptance
-1 1 4.2
-1 1 2.8
1 1 7.4
1 -1 6.1
-1.41 0 3.4
1.41 0 6.6
0 -1.41 4.6
0 1.41 7.0
0 0 5.2
0 0 5.6
0 0 5.4
0 0 6.0
0 0 5.6

Product acceptance is regressed on fat, salt, the product of fat * salt, the square of salt, and the square of fat. The output from such a regression is a set of six prediction coefficients describing the following model:

Acceptance = B1*salt + B2*fat + B12*salt*fat + B11*salt*salt + B22*fat*fat + constant

Assess the variability in these regression parameters by resampling the residuals.

Why resample the residuals? Recall that our goal is to know how stable the results are - how much might the variables' coefficients differ were we to repeat the experiment over and over? If we had the time and money, we would actually repeat the experiment and, with the new product acceptance scores, observe how much the coefficients change from one experiment to the next. Clearly, we lack the time and money to do this. We would like to find a source for simulated experiment results that would allow us to repeatedly recalculate the coefficients. At first glance, it might appear reasonable to put the 13 experiment results on cards (three numbers per card, representing salt, fat, and product acceptance), place the cards in a hat, then resample 13 from the hat and recalculate the regression.

However, that regards the experimental combinations of salt and fat themselves as random effects. Our initial levels were very carefully picked - they were not a random selection of levels of salt and fat. To minimize experimental costs, most combinations have only one experiment, and salt and fat levels are systematically varied in order to produce maximum information. It will help if we conceive of a product acceptance score as a function of levels of cheese and fat, plus a random component: PA f(Salt, fat, random error).

The salt and fat levels we regard as fixed, but the fluctuation in the random error term is what lends uncertainty to the coefficient estimates. Therefore, we can resample the error term and create new, resampled data sets with which to make new, resampled coefficient estimates. Of course, we do not know the true errors because we cannot be certain of the true model. Uncertainty surrounding the true model is what brought us to this point! However, we can use the residuals from the fitted equation to estimate the error terms.

Resampling Procedure

Compute the regression equation for acceptance as a function of the five input parameters listed above. For each of the 13 combinations, plug the salt and fat levels into the computed formula to obtain 13 predicted acceptance levels. Subtract the predicted from the actual acceptance levels; these differences (the residuals) represent the discrepancies between predicted and actual results. What if the residual for, say, combination 4 above were to be applied to, say, combination 8? Since the residuals are a kind of uncertainty fuzz around the forecast values, we can explore the effects of rearranging but not enlarging that uncertainty.

  1. Write down the 13 residuals on pieces of paper. Shuffle the papers into a hat.
  2. Sample, with replacement, 13 residuals. Some will be positive, some negative. Keep in order of drawing.
  3. Add these values to the 13 predicted acceptance levels. Using this new set of simulated acceptance values, perform another regression to develop six new coefficients. Record each of these coefficients on separate scoreboards.
  4. Repeat (3) and (4) 1,000 times. Examine each of the scoreboards to see how widely they diverge, that is, the 5% and 95% percentiles. These results show the degree of uncertainty for each of the six prediction coefficients.

Computer Implementation in Resampling Stats

READ file "cheese.dat"; salt fat accept

each of these vectors, "salt," "fat," and "accept," will acquire 13 values

MULTIPLY salt fat saltfat

"Salt" and "fat" are vectors with 13 values each, "saltfat" will acquire 13 values. Resampling Stats treats "saltfat" as a label, not as an instruction to multiply "salt" by "fat."

SQUARE salt saltsq
SQUARE fat fatsq
REGRESS accept salt fat saltfat saltsq fatsq model

Work out a regression model to forecast acceptance as a function of salt, fat, salt X fat, salt-squared and fats-squared. Put the estimated coefficients into vector "model."

TAKE model 1 B1

the first value in vector "model" is the coefficient for the effect of salt on acceptance

MULTIPLY B1 salt B1salt

"salt" has 13 values in it, each of which will be multiplied by the single coefficient in B1 and put into "B1salt"

TAKE model 2 B2

take the second value in vector "model" and copy it into "B2," the coefficient applicable to fat level

MULTIPLY B2 fat B2fat
TAKE model 3 B12

Continue taking individual coefficients. "B12" is the coefficient for variable "saltfat."

MULTIPLY B12 saltfat B12sf
TAKE model 4 B11

"B11" signifies the coefficient applicable to salt-squared, i.e., salt * salt

MULTIPLY B11 saltsq B11ss
TAKE model 5 B22

"B22" is the coefficient for fat * fat

MULTIPLY B22 fatsq B22ff
TAKE model 6 B0

the sixth value in "model" is the constant; now add together for vector of predicted values

ADD B0 B1salt B2fat B12sf B11ss B22ff accepth

Each vector added together is a 13-value list of forecast acceptance levels forecast from each individual component. "Accepth," therefore, has 13 values, too, each of which represents a predicted acceptance level for the row-values of salt and fat.

SUBTRACT accept accepth resid

What is the difference between the forecast values for each of the 13 cases, and the experimental acceptance? The answer is an array of 13 values, the residuals.

COPY (0) null

prepare to use the "sumabsdev" command

SUMABSDEV resid null sumresid

subtract the "resid" array from an array of zeroes, then add the results disregarding signs

PRINT  model

we should see six coefficients, for salt, fat, saltfat, salt-sqrd, fat-sqrd, and constant

PRINT sumresid

If the study is done again, we would like to know how well the new data fits. The sum of residuals gives an indication, since the better our model fits, the smaller the residuals

'PRINT accepth

Remove the <'> to see the forecast acceptance levels, which can be compared with the actual acceptance levels as shown in the table above

REPEAT 1000

begin a simulation to evaluate what would happen if the acceptance levels had been randomly shifted to the extent of the residuals from our prediction equation

SAMPLE 13 resid resid$

scramble the residuals (the discrepancies between experimental and predicted acceptance values), putting them into vector "resid$" where the <$> indicates a simulation equivalent

ADD resid$ accepth accept$

modify the predicted acceptance figures by adding the scrambled residual

REGRESS noprint accept$ salt fat saltfat saltsq fatsq model$

Perform another regression, but this time with the altered "accept$" figures. The "noprint" option prevents 1,000 detailed reports on the progress of the regression.

SCORE model$ B1$ B2$ B12$ B11$ B22$ constnt$

Since "SCORE" saves only one value into each scoreboard vector, we provide a series of destination vectors. Each of these will receive just one new value for each Repeat loop.

END
PERCENTILE B1$ (5 95) B1-range

Vector "B1$" holds 1,000 possible values for coefficient B1, based on simulating different acceptance levels. So the 5%-95% shows us reliability for the original computation of coefficient B1.

PERCENTILE B2$ (5 95) B2-range
PERCENTILE B12$ (5 95) B12range
PERCENTILE B11$ (5 95) B11range
PERCENTILE B22$ (5 95) B22range
PERCENTILE constnt$ (5 95) conrange
PRINT B1-range B2-range B12range B11range B22range conrange

Results

Coefficient Model .05 limit .95 limit
B1 (salt) 1.371 1.076 1.679
B2 (fat) 0.487 0.221 0.789
B12 (saltfat) 0.525 0.079 0.995
B11 (saltsq) -0.346 -0.633 -0.064
B22 (fatsq) 0.056 -0.234 0.332
Constant 5.56 >5.18 5.88

Conclusion

The coefficient for the fat-squared parameter is very small and cannot be distinguished from zero. Perhaps the food company should repeat the regression analysis, omitting any fat-squared term. It looks as though salt is the main ingredient that leads to higher acceptance, with fat somewhat less important. The saltfat term may be nearly as important as fat itself, but the spread in its coefficient (B12) is very large.


Home | Order | Software | Books | Courses | Teaching | Support | About | Search | Contact | Mailing List

Site Design by NEW TARGET
Site Hosted by Hagen Hosting
© 2003 Resampling Stats, Inc.
Contact Resampling Stats