| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Drillhole
Problem
The file "drill100.dat" contains 100 measurements of the diameter of drill holes (see Gunter, 1991). What is the confidence interval for the mean?
Resampling Procedure
We will use a bootstrap method to estimate the confidence interval for the mean. We seek to know how reliable our estimate of the mean is. How much might it differ from one sample to the next? If we had time and money, we could take lots of additional samples and learn how much the mean diameter changes from sample to sample. Lacking time and money, we need a hypothetical universe to draw samples from. What is our best guess about what such a universe might look like? It is the observed sample. So, we can create our hypothetical universe by simply copying our original sample over and over until we have, say, millions of copies of observation #1, millions of copies of observation #2, and so forth.
Now we can draw samples to see how they behave and to learn how variable the mean is from one to the next. Actually, even on a computer, replicating millions of observations is tedious, so we use a shortcut - in drawing each sample, we simply replace each observation after selecting it. This achieves the same effect as replicating an infinitely large universe from our sample. This procedure - drawing a sample with replacement from the original sample - is called the bootstrap.
- Copy the 100 measurements onto 100 marbles, which are put into an urn.
- Sample with replacement 100 times, recording the values drawn.
- Compute the mean of these values, and save the result in a scoreboard.
- Repeat steps 2-3 1,000 times. Determine the numeric interval that includes 95% of the means recorded in the scoreboard: This is a .95 bootstrap confidence interval (technically it is called a bootstrap percentile interval).
Computer Implementation In Resampling Stats
READ file "drill100.dat" diam
BOXPLOT diam
this is a useful way of looking at data, especially good for detecting outliers
REPEAT 1000
SAMPLE 100 diam diam$
generate a simulated sample "diam$"
MEAN diam$ mean$
SCORE mean$ scrboard
save the mean of the simulated sample onto the scoreboard
END
PERCENTILE scrboard (2.5 97.5) interval
this range includes 95% of all values
PRINT interval
HISTOGRAM scrboard
vector "scrboard" holds the simulated mean$
values
Results
Frequency histogram of the mean diameter of drill holes
interval = 196.5 - 197.9 (this is the 95% confidence range for bootstrapped medians)
Conclusion
The 95% bootstrap confidence interval for the mean lies between 196.5 and 197.9.
References
Gunter, B. (1991, December). Bootstrapping: How to make something from almost nothing and get statistically valid answers. Quality Progress, pp. 97-103.