Resampling Stats

Order
order online
printed form

Software
Excel
Matlab
XLMiner
Downloads

Books, etc. Intro text online
Articles
Bibliographies

Courses Internet course

Teaching Teaching With RS
Teaching Information
What Students Say
What Teachers Say
What Reviewers Say
What Authors Say

Support User Guides
Troubleshooting

About Contact Mailing List

 
About Resampling



| Basic Commands | Probability Puzzles | Hypothesis Test, Count Data | Hypothesis Test, Measured Data | Confidence Interval, Count Data | Confidence Interval, Measured Data | Association / Correlation | Regression | Other Examples |

Darwin-1

[difference in means version; see DARWIN-2 for the rank sum version]

Problem

In Darwin's classic experiment with cross-fertilized and self-fertilized plants, the following data were obtained for plant heights (in inches):

Darwin-1 Table. Heights of Cross- and Self-Fertilized Plants Studied by Darwin

Heights of individual plants (in inches) Mean Diff
(x-s)
Pot I Crossed 23.5 12.0 21.0 18.83 -0.44
Selfed 17.4 20.4 20.0 19.27
Pot II Crossed 22.0 19.2 21.5 20.9 1.90
Selfed 20.0 18.4 18.6 19.0
Pot III Crossed 22.2 20.4 18.3 21.6 23.2 21.14 5.23
Selfed 18.6 15.2 16.5 18.0 16.2 16.91
Pot IV Crossed 21.0 22.1 23.0 12.0 19.52 3.44
Selfed 18.0 12.8 15.5 18.0 16.08

Note. Data are from Darwin, 1900, cited in E.W. Noreen, personal communication, April 21, 1992. Based on these data, the mean height of all crossed plants = 20.2 and of all selfed plants = 17.6, for a mean difference of 2.6.

There are theoretical reasons to expect better growth when plants are cross-fertilized rather than self-fertilized, but do these data support that assumption?

Null hypothesis (H0): Whether plants are cross- or self-fertilized makes no difference with respect to their growth. Alternative hypothesis (H1): Cross-fertilized plants have better growth rates.

Resampling Procedure

We should keep the data from each pot separate as long as possible. Suppose one pot happened to get better growing conditions so that both crossed and selfed plants grew taller? We want to test the possibility that, although the different pots may have different populations of plant growths, each pot has a single population of plant growths, regardless of whether the plant is crossed or selfed, and the crossed/selfed difference arose only by chance. To test this possibility, we constitute such single populations by combining the crossed and selfed growths for each pot, then draw two resamples for each pot.

  1. Write down each height on a separate piece of paper, keeping the data for the different pots separate.
  2. Shuffle the pot I papers, draw (without replacement) two samples of size 3 each (a pseudo "crossed" resample and a pseudo "selfed" resample).
  3. Repeat steps 2-3 for each pot (except the sample sizes are 3 for pot II, 5 for pot IV, and 4 for pot V).
  4. Average all the pseudo crossed resampled values and all the pseudo selfed resampled values, and record the difference (crossed minus selfed).
  5. Repeat steps 2-5, say, 1,000 times.
  6. How often the did the resampled difference recorded in step 5 equal or exceed the observed value of 2.6 (the total mean difference)?

Computer Implementation In Resampling Stats

DATA (23.5 12 21 17.4 20.4 20)  pot1      

for pot 1, the first 3 values are heights of crossed plants, the rest are selfed plants

DATA (22 19.2 21.5 20 18.4 18.6)  pot2     
DATA (22.2 20.4 18.3 21.6 23.2 18.6 15.2 16.5 18 16.2) pot3   
DATA (21 22.1 23 12 18 12.8 15.5 18) pot4         

Now we have four vectors holding the height data from each pot. We will "grow" crossed and selfed plants in each pot.

REPEAT 1000
  SHUFFLE pot1 pot1$

mix up the data from the first pot into simulated vector pot1$

TAKE pot1$ 1,3 x1$

take out the first 3 values, representing the height of crossed plants in pot 1

TAKE pot1$ 4,6 s1$

the remaining 3 values represent the height of selfed plants

SHUFFLE pot2 pot2$

we perform the same operations on pots 2, 3, and 4

TAKE pot2$ 1,3 x2$
  TAKE pot2$ 4,6 s2$
  SHUFFLE pot3 pot3$
  TAKE pot3$ 1,5 x3$

there were 10 plants in pot 3, so we make 2 vectors of 5 numbers each

TAKE pot3$ 6,10 s3$
  SHUFFLE pot4 pot4$
  TAKE pot4$ 1,4 x4$
  TAKE pot4$ 5,8 s4$
  CONCAT x1$ x2$ x3$ x4$ all-x$

put all the resampled "crossed" values in a single vector

CONCAT s1$ s2$ s3$ s4$ all-s$

put all the resampled "selfed" values in a single vector

  MEAN all-x$ mean-x$

find the mean of the resampled "crossed" (all pots)

MEAN all-s$ mean-s$

find the mean of the resampled "selfed" (all pots)

SUBTRACT mean-x$ mean-s$ diff$

find the difference in means

SCORE diff$ scrboard

keep the result in the scoreboard

END
HISTOGRAM scrboard
COUNT scrboard >= 2.6 bigger

compute the number of runs when the difference was at least as great as for the original data

DIVIDE bigger 1000 prob          

convert to a proportion

PRINT prob                     

Results

Frequency histogram of resampled difference in means

prob = 0.016

Conclusion

Crossed plants were on average 2.6 inches taller than selfed plants. Such a height increase occurred with randomized data only .016 of the time. We can be relatively confident in ruling out random chance as an explanation for the improved performance of crossed plants.

References

Darwin, C. (1900). The effects of cross and self-fertilization in the vegetable kingdom (2nd ed.). London: John Murray.


Home | Order | Software | Books | Courses | Teaching | Support | About | Search | Contact | Mailing List

Site Design by NEW TARGET
Site Hosted by Hagen Hosting
© 2003 Resampling Stats, Inc.
Contact Resampling Stats