Resampling Stats

Order
order online
printed form

Software
Excel
Matlab
XLMiner
Downloads

Books, etc. Intro text online
Articles
Bibliographies

Courses Internet course

Teaching Teaching With RS
Teaching Information
What Students Say
What Teachers Say
What Reviewers Say
What Authors Say

Support User Guides
Troubleshooting

About Contact Mailing List

 
About Resampling



| Basic Commands | Probability Puzzles | Hypothesis Test, Count Data | Hypothesis Test, Measured Data | Confidence Interval, Count Data | Confidence Interval, Measured Data | Association / Correlation | Regression | Other Examples |

Newcomb-1

[standard deviation/standard error version; see NEWCOMB-2 for trimmed mean version]

Problem

In 1882, Simon Newcomb measured how long it took a beam of light to travel across the Potomac River on a path of 3,271 meters (see Koopman, 1987). Newcomb's measurements are in the file "newcomb.dat" and are presented below. We will estimate the standard error of these 66 values in two ways: with traditional statistical methods and with a resampling simulation. By comparing the results derived by the two methods we can determine whether the two methods give consistent results.

Newcomb-1 Table. Newcomb's Times for Light to Travel 3,271 Meters (in Milliseconds)

28 26 33 -24 34 -44
27 16 40 -2 29 22
24 21 25 30 23 29
31 19 24 20 36 32
36 28 25 21 28 29
37 25 28 26 30 32
36 26 30 22 36 23
27 27 28 27 31 27
26 33 26 32 32 24
39 28 24 25 32 25
29 27 28 29 16 23

Note. Data are from Koopman, 1987, p. 252.

Conventional Procedure

  1. Look at the 66 values in the "newcomb.dat" file and calculate the mean of these values and the standard deviation (by hand, with a calculator, or with a computer), to show that the mean of these 66 values is 25.48 and that the standard deviation is 12.40.
  2. Calculate the standard error of the mean with this formula: SE = SD/sqrt (n) = 25.48 / sqrt (66) = 25.48 / 8.12 = 1.53.

Note here the distinction between the standard deviation of the sample values (measuring the variability of individual data points), and the standard error of the sample mean (measuring the variability of the sample mean.

Resampling Procedure

The standard error of the mean can be estimated by repeatedly bootstrapping these data, that is, by repeatedly taking samples of 66 values with replacement, calculating the means of these samples, and then determining the standard deviation of these resampled means.

The thinking behind this is as follows: We want to learn how reliable our estimate of the mean is. If we had unlimited time, resources, and access to data we would take lots of additional samples and see how they behave. Instead, we constitute a hypothetical universe - one that contains an unlimited number of replications of our existing sample -- that represents our best guess about what the universe that spawned our sample looks like. Then we take resamples from that. Actually, we take a shortcut that amounts to the same thing - we use only our sample, and replace each value after selecting it for a resample.

  1. On 66 cards, write Newcomb's data. Put the cards in an urn.
  2. Draw out one card, write down the value, replace the card in the urn, and shuffle the cards.
  3. Repeat (2) for a total of 66 values. Compute the mean of these 66 values. Record that mean on a scoreboard.
  4. Repeat (2-3), say, 500 times. Calculate the standard deviation of these means.

Note: You can also write steps 2-3 as a single statement, "Draw a sample of 66 with replacement (a bootstrap sample) and record the mean."

Computer Implementation in Resampling Stats

READ file "newcomb.dat" time

the 66 values in this file represent Newcomb's data

REPEAT 500

here we simulate 500 additional experiments by Newcomb

SAMPLE 66 time time$

using Newcomb's data, generate a simulated sample of 66 values

MEAN time$ mtime$

calculate the mean of the simulated sample

SCORE mtime$ scrboard
END
HISTOGRAM scrboard

display the range of means from our sampling

STDEV scrboard sim_se
PRINT sim_se

the standard error of the mean, obtained by resampling, not by calculation

Results

sim_se = 1.498

Conclusion

For the original sample data, the average was 25.48 milliseconds and the standard deviation was 12.40 milliseconds, and the formula calculation of the standard error of the mean was 1.53. By bootstrapping this data, we simulated 500 additional sets of data. The standard error of those 500 means was 1.498. The values obtained by these two methods are in satisfactory agreement.

References

Koopman, L.H. (1987). Introduction to contemporary statistical methods (2nd ed.). Boston: Duxbury Press.


Home | Order | Software | Books | Courses | Teaching | Support | About | Search | Contact | Mailing List

Site Design by NEW TARGET
Site Hosted by Hagen Hosting
© 2003 Resampling Stats, Inc.
Contact Resampling Stats