| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Darwin-1
[difference in means version; see DARWIN-2 for the rank sum version]
Problem
In Darwin's classic experiment with cross-fertilized and self-fertilized plants, the following data were obtained for plant heights (in inches):
Darwin-1 Table. Heights of Cross- and Self-Fertilized Plants Studied by Darwin
|
Heights of individual plants (in inches) |
Mean |
Diff (x-s) |
| Pot I |
Crossed |
23.5 |
12.0 |
21.0 |
|
|
18.83 |
-0.44 |
|
Selfed |
17.4 |
20.4 |
20.0 |
|
|
19.27 |
|
| Pot II |
Crossed |
22.0 |
19.2 |
21.5 |
|
|
20.9 |
1.90 |
|
Selfed |
20.0 |
18.4 |
18.6 |
|
|
19.0 |
|
| Pot III |
Crossed |
22.2 |
20.4 |
18.3 |
21.6 |
23.2 |
21.14 |
5.23 |
|
Selfed |
18.6 |
15.2 |
16.5 |
18.0 |
16.2 |
16.91 |
|
| Pot IV |
Crossed |
21.0 |
22.1 |
23.0 |
12.0 |
|
19.52 |
3.44 |
|
Selfed |
18.0 |
12.8 |
15.5 |
18.0 |
|
16.08 |
|
Note. Data are from Darwin, 1900, cited in E.W. Noreen, personal communication, April 21, 1992. Based on these data, the mean height of all crossed plants = 20.2 and of all selfed plants = 17.6, for a mean difference of 2.6.
There are theoretical reasons to expect better growth when plants are cross-fertilized rather than self-fertilized, but do these data support that assumption?
Null hypothesis (H0): Whether plants are cross- or self-fertilized makes no difference with respect to their growth. Alternative hypothesis (H1): Cross-fertilized plants have better growth rates.
Resampling Procedure
We should keep the data from each pot separate as long as possible. Suppose one pot happened to get better growing conditions so that both crossed and selfed plants grew taller? We want to test the possibility that, although the different pots may have different populations of plant growths, each pot has a single population of plant growths, regardless of whether the plant is crossed or selfed, and the crossed/selfed difference arose only by chance. To test this possibility, we constitute such single populations by combining the crossed and selfed growths for each pot, then draw two resamples for each pot.
- Write down each height on a separate piece of paper, keeping the data for the different pots separate.
- Shuffle the pot I papers, draw (without replacement) two samples of size 3 each (a pseudo "crossed" resample and a pseudo "selfed" resample).
- Repeat steps 2-3 for each pot (except the sample sizes are 3 for pot II, 5 for pot IV, and 4 for pot V).
- Average all the pseudo crossed resampled values and all the pseudo selfed resampled values, and record the difference (crossed minus selfed).
- Repeat steps 2-5, say, 1,000 times.
- How often the did the resampled difference recorded in step 5 equal or exceed the observed value of 2.6 (the total mean difference)?
Computer Implementation In Resampling Stats
DATA (23.5 12 21 17.4 20.4 20) pot1
for pot 1, the first 3 values are heights of crossed plants, the rest are selfed plants
DATA (22 19.2 21.5 20 18.4 18.6) pot2
DATA (22.2 20.4 18.3 21.6 23.2 18.6 15.2 16.5 18 16.2) pot3
DATA (21 22.1 23 12 18 12.8 15.5 18) pot4
Now we have four vectors holding the height data from each pot. We will "grow" crossed and selfed plants in each pot.
REPEAT 1000
SHUFFLE pot1 pot1$
mix up the data from the first pot into simulated vector pot1$
TAKE pot1$ 1,3 x1$
take out the first 3 values, representing the height of crossed plants in pot 1
TAKE pot1$ 4,6 s1$
the remaining 3 values represent the height of selfed plants
SHUFFLE pot2 pot2$
we perform the same operations on pots 2, 3, and 4
TAKE pot2$ 1,3 x2$
TAKE pot2$ 4,6 s2$
SHUFFLE pot3 pot3$
TAKE pot3$ 1,5 x3$
there were 10 plants in pot 3, so we make 2 vectors of 5 numbers each
TAKE pot3$ 6,10 s3$
SHUFFLE pot4 pot4$
TAKE pot4$ 1,4 x4$
TAKE pot4$ 5,8 s4$
CONCAT x1$ x2$ x3$ x4$ all-x$
put all the resampled "crossed" values in a single vector
CONCAT s1$ s2$ s3$ s4$ all-s$
put all the resampled "selfed" values in a single vector
MEAN all-x$ mean-x$
find the mean of the resampled "crossed" (all pots)
MEAN all-s$ mean-s$
find the mean of the resampled "selfed" (all pots)
SUBTRACT mean-x$ mean-s$ diff$
find the difference in means
SCORE diff$ scrboard
keep the result in the scoreboard
END
HISTOGRAM scrboard
COUNT scrboard >= 2.6 bigger
compute the number of runs when the difference was at least as great as for the original data
DIVIDE bigger 1000 prob
convert to a proportion
PRINT prob
Results
Frequency histogram of resampled difference in means
prob = 0.016
Conclusion
Crossed plants were on average 2.6 inches taller than selfed plants. Such a height increase occurred with randomized data only .016 of the time. We can be relatively confident in ruling out random chance as an explanation for the improved performance of crossed plants.
References
Darwin, C. (1900). The effects of cross and self-fertilization in the vegetable kingdom (2nd ed.). London: John Murray.