| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Quality-1
[Pearson correlation coefficient version; see Quality-2 for Spearman correlation coefficient version]
Problem
A group of skin moisturizers was evaluated by a test panel and ranked by estimated quality, "1" being highest quality and "48" the lowest (Consumers Union, 1986, as cited in Noreen, 1989, p. 26). The price per ounce was also calculated from retail prices. One would expect that higher-priced cosmetics would be higher quality. (Since "1" represents the highest quality, we expect a negative correlation, that is, cheaper cosmetics tending to be at the bottom of the quality list.) Was there a significant relationship between price and quality?
Quality-1 Table. Price Per Ounce Of Skin Moisturizers In Order Of Descending Estimated Quality
| Rank Price/oz. |
Rank Price/oz. |
Rank Price/oz. |
Rank Price/oz. |
| 1. $1.83 |
13. $0.28 |
25. $1.65 |
37. $3.89 |
| 2. 0.23 |
14. 0.11 |
26. 3.43 |
38. 0.17 |
| 3. 1.52 |
15. 0.12 |
27. 0.59 |
39. 1.65 |
| 4. 1.91 |
16. 0.12 |
28. 0.42 |
40. 0.38 |
| 5. 0.25 |
17. 0.30 |
29. 0.40 |
41. 0.45 |
| 6. 0.10 |
18. 0.45 |
30. 1.56 |
42. 1.30 |
| 7. 0.12 |
19. 0.24 |
31. 0.24 |
43. 3.07 |
| 8. 0.24 |
20. 0.22 |
32. 0.26 |
44. 1.42 |
| 9. 0.33 |
21. 0.11 |
33. 1.69 |
45. 2.11 |
| 10. 0.19 |
22. 0.25 |
34. 0.10 |
46. 6.10 |
| 11. 0.26 |
23. 3.33 |
35. 0.62 |
47. 4.29 |
| 12. 0.26 |
24. 1.31 |
36. 0.25 |
48. 0.25 |
Note. Data are from Consumers Union, 1986, as cited in Noreen, 1989, Table 28, p. 26.
If there is a relationship between quality and price, the correlation is expected to be negative. (The best cosmetics, ranked early in the list, should have the highest prices.) However, the correlation coefficient for these data is +.43. Is this positive correlation statistically significant?
Null hypothesis (H0): There is no correlation between quality and price. Alternative hypothesis (H1): There is a correlation between price and quality.
Resampling Procedure
We test that hypothesis by estimating whether random rearrangements of the data could produce a correlation coefficient as large as .43.
- Create two groups of paper cards: 48 numbered sequentially from "1" to "48" (quality) and another 48 with the prices on them.
- Shuffle one of the sets, say, the set with price, and compute correlation for the simulated data.
- Repeat (2) 1,000 times. Compute the proportion of the trials in which the correlation was .43 or larger.
Computer Implementation in Resampling Stats
READ file "quality.dat" price
this is our price information, ordered from highest through to lowest quality
DATA 1,48 quality
put numbers "1" through "48" into vector "quality"
REPEAT 1000
do 1,000 simulated correlations with randomized data
SHUFFLE price price$
randomize the price data
CORR price$ quality corr$
the <$> in vector "corr$" signifies that it is a simulated value
SCORE corr$ scrboard
retain copy of this simulated correlation on a scoreboard
END
HISTOGRAM scrboard
COUNT scrboard >= 0.43 hits
how often did random recombination equal or better the observed results?
DIVIDE hits 1000 prob
adjust for the number of repetitions
PRINT prob
Results
Resampled correlation between price and quality
Results of 5 resampling experiments of 1,000 independent trials each:
prob = 0
prob = 0
prob = 0.001
prob = 0
prob = 0001
Conclusion
The correlation between quality and price was positive (.43), not negative as expected. Could this correlation be due to chance? The resampling simulation of the null hypothesis -- that price and quality are not related -- showed that correlation coefficients as great as .43 are highly unlikely to occur and therefore we reject the null hypothesis. We are confident that the correlation did not arise from random factors. However, the sign of the relationship is positive, that is, cheaper cosmetics tended to be better quality.
Note: These data are also addressed in QUALITY-2, where the prices are converted to ranks and the Spearman correlation coefficient (the same calculation as the Pearson correlation coefficient, only using ranks) is used.
References
Noreen, E.W. (1989). Computer intensive methods for testing hypotheses: An introduction. New York: Wiley.