| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Hormones
Problem
Testosterone, although a male sex hormone, is also found in women. Hormone levels generally decrease with age. As part of an extensive study of the biochemistry of Pima Indians, testosterone levels in Pima Indian women were measured (Purifoy, Koopman, Tatum, & Mayes, 1981, cited in Koopman, 1987). The data shown below are also in file "hormone.dat". Does this dataset confirm the expected decrease in testosterone with age?
Hormones Table. Testosterone Levels of Pima Indian Women
| Age |
Level |
| 43 |
20 |
| 38 |
21 |
| 36 |
19 |
| 35 |
18 |
| 29 |
51 |
| 27 |
37 |
| 27 |
68 |
| 26 |
28 |
| 25 |
52 |
| 58 |
18 |
| 25 |
19 |
| 22 |
50 |
| 19 |
43 |
| 44 |
13 |
| 34 |
19 |
| 30 |
23 |
| 29 |
27 |
| 26 |
31 |
| 25 |
37 |
| 22 |
31 |
Note. Data are from Purifoy et al., 1981, cited in Koopman, 1987.
The correlation between age and testosterone level was -.58.
We begin by computing the conventional correlation coefficient (-.58) between age and testosterone. Could this negative a correlation coefficient have arisen by chance? We would like to have a larger sample, but this is a small tribe, and it would take many years and a great deal of expense to get more values. With this small dataset, is it possible to obtain a correlation coefficient of -.58 just by chance even if the testosterone-age relationship does not hold for this tribe?
Null hypothesis (H0): There is no relationship between age and testosterone. Alternative hypothesis (H1): There is a negative relationship between age and testosterone.
Resampling Procedure
One way to test the null hypothesis is to randomly reassign the paired data and recalculate a correlation coefficient. If we do this enough times, we will get an indication of the probability of obtaining a correlation coefficient of -.58 by chance.
- Write the testosterone levels onto pieces of paper, and put the papers into a hat.
- Draw the testosterone levels at random, and link them to the age values. Perform a correlation calculation on this randomized data. The result is one correlation coefficient that could have arisen purely by chance. Record this value.
- Repeat (2) 10,000 times. How frequently did the correlation of randomly sorted data reach or exceed the observed value of -.58?
Computer Implementation In Resampling Stats
MAXsize C$$ 10000
make room for many repeats
READ file "hormon1.dat" age test
acquire the dataset tabled above into two vectors called "age" and "test" respectively
REPEAT 10000
SHUFFLE test test$
Randomize the testosterone values. The <$> suffix indicates a simulated group.
CORR age test$ c$
and do a correlation on these randomized values
SCORE c$ scrboard
keep track of the simulated correlation coefficients
END
COUNT scrboard <= -0.579 chance
how often did the simulation throw up a value as far away from zero as the observed statistic (be careful here, since we are looking for a more negative result)?
DIVIDE chance 10000 prob
PRINT prob
Results
Frequency histogram of resampled correlation coefficient
Results of two runs:
prob = 0.0001
prob = 0.0008
Conclusion
A correlation coefficient at least as negative as -.58 was obtained fewer than 10 times out of 10,000. Therefore the probability that the observed value of -.58 could arise by chance if there is no correlation between testosterone and age was less than .01%. The null hypothesis can be rejected, and we conclude that the Pima women do show a negative relationship between age and testosterone level.
References
Koopman, L.H. (1987). Introduction to contemporary statistical methods (2nd ed.). Boston: Duxbury Press.
Purifoy, F.E., Koopman, L.H., Tatum, R.W., & Mayes, D.E. (1981). Serum androgens by age in obese Pima Indian females. Journal of Physical Anthropology, 55, 491-496.