| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Newcomb-2
[analysis of the trimmed mean; see NEWCOMB-1 for standard deviation/standard error version]
Problem
In 1882, Simon Newcomb measured how long it took a beam of light to travel across the Potomac River on a path of 3,271 meters (see Koopman, 1987). The data he obtained have several unlikely values, as can been seen by the negative values in the table below. (This data was discussed in NEWCOMB-1 and is in the file "newcomb.dat".) Such extreme values, or outliers, can have a large influence on some statistics. One way of dealing with this is to use statistics that are not sensitive to those values. For example, in measuring central tendency, we could use the mean after taking away the two largest and the two smallest values - a trimmed mean. With a data set this large (66 numbers), reducing the sample size will be well tolerated. In this problem, we obtain a bootstrap estimate of the standard error of the trimmed mean by the same technique used for deriving the standard mean in NEWCOMB-1.
Newcomb-2 Table. Newcomb's Times for Light to Travel 3,271 Meters (in Milliseconds)
| 28 |
26 |
33 |
-24 |
34 |
-44 |
| 27 |
16 |
40 |
-2 |
29 |
22 |
| 24 |
21 |
25 |
30 |
23 |
29 |
| 31 |
19 |
24 |
20 |
36 |
32 |
| 36 |
28 |
25 |
21 |
28 |
29 |
| 37 |
25 |
28 |
26 |
30 |
32 |
| 36 |
26 |
30 |
22 |
36 |
23 |
| 27 |
27 |
28 |
27 |
31 |
27 |
| 26 |
33 |
26 |
32 |
32 |
24 |
| 39 |
28 |
24 |
25 |
32 |
25 |
| 29 |
27 |
28 |
29 |
16 |
23 |
Note. Data are from Koopman, 1987, p. 252.
Resampling Procedure
The mean after trimming the highest two and lowest two values is 27.21 (a 6% trim). The standard error of the trimmed mean will be estimated by repeatedly bootstrapping these data.
- Place the 66 values in an urn.
- Sample 66 values with replacement from these data (a bootstrap sample).
- Trim off the highest 2 and lowest 2 and record the mean of the remaining 62.
- Repeat (2-3), say, 500 times.
- Determine the standard deviation of these resampled trimmed means.
Computer Implementation in Resampling Stats
READ file "newcomb.dat" time
there should be 66 values in that file
REPEAT 500
here we simulate 500 additional experiments, but
with trimmed data
SAMPLE 66 time time$
SORT time$ sorted$
TAKE sorted$ 3,64 trim$
trim off the top 2 and bottom 2
MEAN trim$ mtrim$
find the trimmed mean
SCORE mtrim$ scrboard
END
HISTOGRAM scrboard
STDEV scrboard sim_se
find the standard deviation of all the bootstrapped
trimmed means
PRINT sim_se
Results
sim_se = 0.90999
Conclusion
The standard error of the mean (see NEWCOMB-1) was about 1.3; for the trimmed mean it was only about .9. For these data, the trimmed mean appears to be a more reliable measure of central tendency. The student is invited to explore the effects of removing, say, the five highest and five lowest values.
References
Koopman, L.H. (1987). Introduction to contemporary statistical methods (2nd ed.). Boston: Duxbury Press.