Resampling Stats

Order
order online
printed form

Software
Excel
Matlab
XLMiner
Downloads

Books, etc. Intro text online
Articles
Bibliographies

Courses Internet course

Teaching Teaching With RS
Teaching Information
What Students Say
What Teachers Say
What Reviewers Say
What Authors Say

Support User Guides
Troubleshooting

About Contact

 
About Resampling



| Basic Commands | Probability Puzzles | Hypothesis Test, Count Data | Hypothesis Test, Measured Data | Confidence Interval, Count Data | Confidence Interval, Measured Data | Association / Correlation | Regression | Other Examples |

Newcomb-2

[analysis of the trimmed mean; see NEWCOMB-1 for standard deviation/standard error version]

Problem

In 1882, Simon Newcomb measured how long it took a beam of light to travel across the Potomac River on a path of 3,271 meters (see Koopman, 1987). The data he obtained have several unlikely values, as can been seen by the negative values in the table below. (This data was discussed in NEWCOMB-1 and is in the file "newcomb.dat".) Such extreme values, or outliers, can have a large influence on some statistics. One way of dealing with this is to use statistics that are not sensitive to those values. For example, in measuring central tendency, we could use the mean after taking away the two largest and the two smallest values - a trimmed mean. With a data set this large (66 numbers), reducing the sample size will be well tolerated. In this problem, we obtain a bootstrap estimate of the standard error of the trimmed mean by the same technique used for deriving the standard mean in NEWCOMB-1.

Newcomb-2 Table. Newcomb's Times for Light to Travel 3,271 Meters (in Milliseconds)

28 26 33 -24 34 -44
27 16 40 -2 29 22
24 21 25 30 23 29
31 19 24 20 36 32
36 28 25 21 28 29
37 25 28 26 30 32
36 26 30 22 36 23
27 27 28 27 31 27
26 33 26 32 32 24
39 28 24 25 32 25
29 27 28 29 16 23

Note. Data are from Koopman, 1987, p. 252.

Resampling Procedure

The mean after trimming the highest two and lowest two values is 27.21 (a 6% trim). The standard error of the trimmed mean will be estimated by repeatedly bootstrapping these data.

  1. Place the 66 values in an urn.
  2. Sample 66 values with replacement from these data (a bootstrap sample).
  3. Trim off the highest 2 and lowest 2 and record the mean of the remaining 62.
  4. Repeat (2-3), say, 500 times.
  5. Determine the standard deviation of these resampled trimmed means.

Computer Implementation in Resampling Stats

READ file "newcomb.dat" time

there should be 66 values in that file

REPEAT 500

here we simulate 500 additional experiments, but with trimmed data

SAMPLE 66 time time$
  SORT time$ sorted$
  TAKE sorted$ 3,64 trim$ 

trim off the top 2 and bottom 2

MEAN trim$ mtrim$

find the trimmed mean

SCORE mtrim$ scrboard
END
HISTOGRAM scrboard
STDEV scrboard sim_se

find the standard deviation of all the bootstrapped trimmed means

PRINT sim_se

Results

sim_se = 0.90999

Conclusion

The standard error of the mean (see NEWCOMB-1) was about 1.3; for the trimmed mean it was only about .9. For these data, the trimmed mean appears to be a more reliable measure of central tendency. The student is invited to explore the effects of removing, say, the five highest and five lowest values.

References

Koopman, L.H. (1987). Introduction to contemporary statistical methods (2nd ed.). Boston: Duxbury Press.


Home | Order | Software | Courses | Teaching | Support | About | Search | Contact

Site Design by NEW TARGET
Site Hosted by Hagen Hosting
© 2003 statistics.com LLC
Visit statistics.com