| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Fitness-1
[chi-squared version; see FITNESS-2 for a sum-of-absolute-deviations version]
Problem
At the Cooper Institute for Aerobics Research, the physical fitness of approximately 10,000 men was tested twice with a gap of 4.9 ± 4.1 years between tests (Blair et al., 1995). Fitness was defined as the ability to reach 85% of one's age-adjusted heart rate during a treadmill test. Some men were fit at both tests; others were unfit at one or more of the examinations. Death rates from both heart disease and all causes were monitored for approximately five years after the last examination. Death rates from all causes are shown in the table below. Can we conclude that there is a relationship between fitness and mortality? Look at observed deaths, along with the hypothetical figure "expected deaths." Expected deaths are calculated by distributing the 223 deaths among the four groups in proportion to the number of men in each. Do observed deaths depart from expected deaths to a greater extent than chance might produce? We measure the extent of this departure with the traditional chi-square statistic (departure from expected squared, divided by expected, summed over all groups).
Fitness-1 Table. Death Rates Of Fit and Unfit Men
| Fitness at test 1 |
Fitness at test 2 |
# of men |
Deaths, all causes |
Expected deaths |
Difference |
Difference squared |
Diff_Sq/ Expected |
| Unfit |
Unfit |
373 |
32 |
9 |
+23 |
529 |
58.8 |
| Unfit |
Fit |
650 |
25 |
15 |
+10 |
100 |
6.7 |
| Fit |
Unfit |
221 |
9 |
5 |
+4 |
16 |
3.2 |
| Fit |
Fit |
8533 |
157 |
195 |
-38 |
1444 |
7.4 |
| Totals |
---- |
9777 |
223 |
|
|
|
76.1 |
Null hypothesis (H0): All groups share the same mortality rate. Alternative hypothesis (H1): Unfit men tend to have higher mortality.
Resampling Procedure
- Take 9,777 pieces of paper. Write "dead" on 223 of these pieces, "live" on the rest.
- Shuffle the paper. Then draw a sample of 373 pieces, labeling them "U/U" (to represent the unfit/unfit group).
- Count the number of "dead" in this sample.
- Similarly, draw samples of 650 and label them "U/F" (to represent unfit/fit), 221 "F/U," and 8,533 "F/F," and count the number dead in each group.
- Because the samples were drawn without replacement, there will always be 223 "dead," for an average death rate of .0228. The expected death rates in samples of size 373 will therefore be 9, for samples of 650 will be 15, and so forth. Calculate squares of differences between the expected and observed deaths. Divide each square by the number of expected deaths. Record the sum of these adjusted squares -- the resampled chi-square statistic -- in the scoreboard.
- Repeat (2-5) 1,000 times.
- Determine how frequently the sum of squared differences in the simulation is at least as large as 76 (the observed value).
Computer Implementation in Resampling Stats
MAXSIZE allmen 10000 ff$ 10000
DATA (9 15 5 195) expected
set up a vector holding the expected values
URN 223#8 9556#0 allmen
#8 represents deaths ("behind the eight ball"); #0 represents survivors
REPEAT 1000
SHUFFLE allmen allmen
TAKE allmen 1,373 uu$
simulate the group of 373 men who were unfit in both tests
COUNT uu$ =8 uudeath$
determine the number of deaths in this simulated sample group
TAKE allmen 374,1023 uf$
simulate the group of 650 men who were unfit at first, but fit later
COUNT uf$ =8 ufdeath$
TAKE allmen 1026,1246 fu$
simulate the group of 221 men who were fit at first test but unfit at the second test
COUNT fu$=8 fudeath$
TAKE allmen 1247,9779 ff$
simulate the remaining 8,553 men who were fit in both tests
COUNT ff$=8 ffdeath$
CONCAT uudeath$ fudeath$ ufdeath$ ffdeath$ deaths
put the results together in one vector so we can begin chi-square calculations
SUBTRACT deaths$ expected diff$
first find the differences between simulated and expected
SQUARE diff$ sqrdiff$
calculate the sum of squares for these differences
DIVIDE sqrdiff$ expected adjust$
correct the squares by the expected values
SUM adjust$ chisq$
add up all the corrected squared differences to obtain a simulated chi-square
SCORE chisq$ scrboard
END
HISTOGRAM scrboard
COUNT scrboard >=76 k
DIVIDE k 1000 prob
PRINT prob
Results
Frequency histogram of resampled sum of absolute deviations
prob = 0.014
Conclusion
The observed sum of squares for the differences between expected and observed deaths was 76, which was higher than that obtained in all but 14 of 1000 simulated runs. We reject the null hypothesis of no relationship between fitness and mortality. Can we also conclude that fitness caused the reduction in mortality? Not necessarily. Perhaps healthier men were both more likely to pass the fitness test and to show lower mortality. (In the original study, it was estimated that heredity accounted for 25%-40% of aerobic fitness.)
References
Blair, S.N., Kohl, H.W., Barlow, C.E., Paffenbarger, R.S., Gibbons, L.W., & Macera, C.A. (1995). Changes in physical fitness and all-cause mortality: A prospective study of healthy and unhealthy men. Journal of the American Medical Association, 273(14), 1093-1098.