| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Tallmen
Problem
The following table shows the frequencies with which 43 short men and 52 tall men were classified as "followers," "unclassifiable," or "leaders" (see Siegel & Castellan, 1988, p. 113). Is leadership independent of height? Are tall men more likely to get classified as leaders? More generally, is there a relationship between height and leadership classification? Our test measure the extent to which the observed results depart from what we would expect if height were independent of classification. We will use the chi-square statistic (the sum of the squared deviations from expected, divided by the expected).
Tallmen Table. Relationship Between Height and Leadership Classification
| Leadership Classification |
Short |
Tall |
Combined |
| Followers |
22 |
14 |
36 |
| Unclassifiable |
9 |
6 |
15 |
| Leaders |
12 |
32 |
44 |
| Total |
43 |
52 |
95 |
Note. Data are adapted from Siegel & Castellan, 1988, p. 113.
Null hypothesis (H0): There is no relationship between height and leadership classification. Alternative hypothesis (H1): Height and leadership classification are not independent of one another.
Resampling Procedure
First we calculate the chi-square value as the test statistic. Generate a table of observed values and also expected values (if leadership classification is independent of height) as shown below:
| Leadership Classification |
Short |
Tall |
Expect Short |
Expect Tall |
Dev'n Short |
Dev'n Tall |
Totals |
| Followers |
22 |
14 |
16 |
20 |
6 |
6 |
36 |
| Unclassifiable |
9 |
6 |
7 |
8 |
1 |
2 |
15 |
| Leaders |
12 |
32 |
20 |
24 |
8 |
8 |
44 |
| Total |
43 |
52 |
|
|
|
|
95 |
Note: How do we derive the the expected counts, say the 16 expected short followers? Forty-five percent (43 out of 95) of the men are short. Overall, there are 36 followers. If height has nothing to do with classification (i.e., is independent), we would expect 45% of those 36 followers, or 16, to be short.
The chi-square is the sum of the squared deviations from expected, divided by the expected:
(6*6/16) + (6*6/20) + (1*1/7) + (2*2/8) + (8*8/20) + (8*8/24) = 10.9
Next we repeatedly simulate what happens with a population in which leadership is independent of height. Is a chi-square of 10.9 significant? Might it occur by chance with a random association of height with classification?
- Take 95 marbles and write "followers" on 36, "unclassifiable" on 15, and "leaders" on 44, and put these into an urn.
- Take without replacement 43 marbles to represent short men and the remaining 52 balls to represent tall men.
- Count the number of followers, unclassifiable, and leaders in the "short" and "tall" groups. Calculate a simulated chi-square from these data. Record that value.
- Repeat (2) and (3) 999 times to obtain the distribution of chi-squares from a single population. How often did the simulated chi-square equal or exceed the experimental value of 10.9?
Computer Implementation in Resampling Stats
DATA (22 9 12) short
the numbers of short men in the three categories of follower, unclassifiable, and leader
DATA (14 6 32) tall
vector "tall" holds comparable information
for the tall men
DATA (16 7 20) exshort
these are expected numbers if height and leadership are independent
DATA (20 8 24) extall
CONCAT exshort extall expected
put both the expected vectors together in a single list
CONCAT short tall allmen
and do the same for the numbers in the real-world groups
SUBTRACT allmen expected diff
SQUARE diff diffsq
DIVIDE diffsq expected chi
the operations above result in the chi-square for the real-world data
SUM chi chi_sq
PRINT chi_sq
we should have calculated this in advance, but this is a cross-check
URN 36#7 15#8 44#9 men
#7 signifies a follower, #8 unclassifiable, #9 a leader
REPEAT 999
SHUFFLE men men$
randomizing destroys whatever link there was between height and leadership
TAKE men$ 1,43 short$
form a simulated group "short$"
TAKE men$ 44,95 tall$
the rest of the values go into simulated group "tall$"
COUNT short$=7 sf$
how many "short" men were short followers?
COUNT short$=8 su$
how many were short-unclassifiable?
COUNT short$=9 sl$
and how many short leaders?
COUNT tall$=7 tf$
tall followers
COUNT tall$=8 tu$
tall unclassifiable
COUNT tall$=9 tl$
tall leaders
CONCAT sf$ su$ sl$ tf$ tu$ tl$ all$
put these six simulated counts back into a single vector "all$"
SUBTRACT all$ expected diff$
and proceed to compute a chi-square for these data
SQUARE diff$ diffsq$
DIVIDE diffsq$ expected adjsq$
SUM adjsq$ chisq$
this time "chisq$" has <$> attached to signify a simulated value
SCORE chisq$ scrboard
END
'HISTOGRAM scrboard
'BOXPLOT scrboard
remove the apostrophe (') to activate either command if you want to see the distribution of results
COUNT scrboard >= 10.9 more
how often did a simulation chi-square at least equal the experimental value?
DIVIDE more 999 prob
convert that value into a proportion of the number of repeat runs
PRINT prob
Results
Frequency histogram of resampled chi-square value
prob = 0.0043 after 10,000 runs
Conclusion
The null hypothesis is rejected. There is an interaction between height and the extent to which men are classified with regard to their leadership qualities. By inspection of the data, we can see that tall men are overrepresented in the leadership category, and short men are overrepresented in the follower category.
References
Siegel, S., & Castellan, N. J., Jr.. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill.