Resampling Stats

Order
order online
printed form

Software
Excel
Matlab
XLMiner
Downloads

Books, etc. Intro text online
Articles
Bibliographies

Courses Internet course

Teaching Teaching With RS
Teaching Information
What Students Say
What Teachers Say
What Reviewers Say
What Authors Say

Support User Guides
Troubleshooting

About Contact Mailing List

 
About Resampling



| Basic Commands | Probability Puzzles | Hypothesis Test, Count Data | Hypothesis Test, Measured Data | Confidence Interval, Count Data | Confidence Interval, Measured Data | Association / Correlation | Regression | Other Examples |

Cholesterol

Problem

The following are cholesterol reduction scores for nine men who were given the drug cholestyramine: -21.0, 3.25, 10.75, 13.75, 32.5, 39.5, 41.75, 56.75, 62.1. (The negative reading indicates that one man had increased cholesterol rather than a reduction.) How confidently can we draw a conclusion about the effect of the drug on cholesterol reduction (i.e., the central tendency) with experimental data like this that may have some extreme values ("outliers")? One approach to eliminating the destabilizing effect that outliers can have on measures of central tendency is to trim off the highest and the lowest values. Trimming the two highest and the two lowest scores from this set of nine produces a "22% trimmed mean." To determine the effect of trimming extreme values from this data set, we must compare the standard deviation of the 22% trimmed mean of the cholesterol scores to that of the untrimmed scores (see Efron & Tibshirani, 1991).

Resampling Procedure

Find the standard deviation of the untrimmed scores.

  1. Write the 9 scores onto 9 pieces of paper.
  2. Mix up the papers, select 1 and copy down its number, then replace the paper. Repeat this until you have a list of 9 numbers.
  3. Calculate the mean of the untrimmed scores, and record it on a scoreboard.
  4. Repeat (2) and (3) 1,000 times.
  5. Calculate the standard deviation of the recorded scores.

Find the standard deviation of the trimmed scores.

  1. Write the 9 scores onto 9 pieces of paper.
  2. Mix up the papers, select 1 and copy down its number, then replace the paper. Repeat this until you have a list of 9 numbers.
  3. Sort the list from lowest to highest. Eliminate the two lowest and the two highest scores.
  4. Calculate the mean of the remaining scores, and record it on a scoreboard.
  5. Repeat (2) through (4) 1,000 times.
  6. Calculate the standard deviation of the recorded scores.

Computer Implementation in Resampling Stats

DATA (-21 3.25 10.75 13.75 32.5 39.5 41.75 56.75 62.1) effect

the vector "effect" holds all the experimental data, that is, the 9 scores

REPEAT 1000
  SAMPLE 9 effect effect$

"effect$" is a simulated group of results. We are going to take means of this group with and without trimming the two highest and two lowest scores

MEAN effect$ untrim$
  SCORE untrim$ untrim$$

record the mean of the untrimmed scores on a scoreboard called "untrim$$"

SORT effect$ sorted

sort the scores in preparation for trimming the highest and lowest

TAKE sorted 3,7 middle$

eliminate the two highest and the two lowest scores

MEAN middle$, trim$

calculate the mean of the scores, trimmed to the middle 55%

SCORE trim$ trim$$

save the mean of the trimmed scores on a scoreboard called "trim$$"

END
STDEV untrim$$ sdall
STDEV trim$$   sdtrim

this is the standard deviation of the mean of trimmed scores

PRINT sdall sdtrim

Results

sdall = 8.58

sdtrim = 10.81

Conclusion

Although we might have expected that removing the most extreme values would make the standard deviation less, the resampling simulation shows that more precise results would be obtained by retaining all data. The standard deviation of the trimmed data was 10.8 compared with 8.6 for untrimmed data. In other words, the means obtained from 9 values were more consistent than the means obtained from only 5 values, and any stability gained by trimming was more than offset by the loss of stability from a smaller sample. In this case, the original data did not have sufficiently extreme outliers to make it worthwhile to take a 22% trimmed mean. Had we trimmed a smaller portion of the data, or had one or two of the values been more extreme, then trimming might have been a useful tool. Exercise: What happens if you trim only one value from each end (an 11% trim)?

References

Efron, B., & Tibshirani, R. J. (1991, July 26). Statistical data analysis in the computer age. Science, 253, pp. 390-395.


Home | Order | Software | Books | Courses | Teaching | Support | About | Search | Contact | Mailing List

Site Design by NEW TARGET
Site Hosted by Hagen Hosting
© 2003 Resampling Stats, Inc.
Contact Resampling Stats