| Basic Commands |
Probability Puzzles |
Hypothesis Test, Count Data |
Hypothesis Test, Measured Data |
Confidence Interval, Count Data |
Confidence Interval, Measured Data |
Association / Correlation |
Regression |
Other Examples |
Quality-2
[Spearman correlation coefficient version; see QUALITY-1 for Pearson coefficient version]
Problem
A group of skin moisturizers was evaluated by a test panel and ranked by estimated quality, "1" being highest quality and "48" the lowest (Consumers Union, 1986, as cited in Noreen, 1989, p. 26). The price per ounce was also calculated from retail prices. One would expect that higher-priced cosmetics would be higher quality. (Since "1" represents the highest quality, we expect a negative correlation, that is, cheaper cosmetics tending to be at the bottom of the quality list.) In QUALITY-1, we examined whether there a significant relationship between price and quality. A highly significant positive correlation of .43 indicated that cheaper cosmetics tended to be higher quality. One possible explanation was that a few high-priced but poor-quality items might have unduly influenced the correlation coefficient. In this version of the problem, we use a rank-order approach to eliminate the influence of extreme values.
Quality-2 Table. Price Per Ounce Of Skin Moisturizers In Order Of Descending Estimated Quality
| Rank Price/oz. |
Rank Price/oz. |
Rank Price/oz. |
Rank Price/oz. |
| 1. $1.83 |
13. $0.28 |
25. $1.65 |
37. $3.89 |
| 2. 0.23 |
14. 0.11 |
26. 3.43 |
38. 0.17 |
| 3. 1.52 |
15. 0.12 |
27. 0.59 |
39. 1.65 |
| 4. 1.91 |
16. 0.12 |
28. 0.42 |
40. 0.38 |
| 5. 0.25 |
17. 0.30 |
29. 0.40 |
41. 0.45 |
| 6. 0.10 |
18. 0.45 |
30. 1.56 |
42. 1.30 |
| 7. 0.12 |
19. 0.24 |
31. 0.24 |
43. 3.07 |
| 8. 0.24 |
20. 0.22 |
32. 0.26 |
44. 1.42 |
| 9. 0.33 |
21. 0.11 |
33. 1.69 |
45. 2.11 |
| 10. 0.19 |
22. 0.25 |
34. 0.10 |
46. 6.10 |
| 11. 0.26 |
23. 3.33 |
35. 0.62 |
47. 4.29 |
| 12. 0.26 |
24. 1.31 |
36. 0.25 |
48. 0.25 |
Note. Data are from Consumers Union, 1986, as cited in Noreen, 1989, Table 28, p. 26.
We will measure correlation with the Spearman rank correlation coefficient. Convert the original price data into ranks, "1" being lowest price and "48" the highest price. Correlate the rank with the quality ranking, using the Pearson correlation formula. The answer is .41. Could a value this high have resulted from random factors if there were no underlying relationship between price and quality (the null hypothesis)?
Null hypothesis (H0): There is no correlation between quality and price. Alternative hypothesis (H1): There is a correlation between price and quality.
Resampling Procedure
We will simulate random correlations with 48 pairs of rank data, to determine how often a correlation coefficient of .41 could arise by chance.
- Write the numbers "1" through "48" on 48 playing cards.
- Shuffle the cards and deal them out in one long column.
- Correlate the numbers on each card with its position, that is, by a strict sequence from "1" to "48."
- Record the simulated correlation coefficient.
- Repeat (2-4) 1,000 times. Compute the proportion of trials in which simulated data showed a correlation coefficient as large or larger than found in the observed results.
Computer Implementation in Resampling Stats
DATA 1,48 costrank
we do not need to worry about the actual costs, so simply set up a set of cards with all 48 rank values
DATA 1,48 quality
put numbers "1" through "48" into the vector "quality"
REPEAT 1000
do a simulated correlation with randomized data
SHUFFLE costrank rank$
randomize the price data into a simulated vector "rank$"
CORR rank$ quality corr$
this is one possible correlation coefficient using randomized data
SCORE corr$ scrboard
retain a copy of this simulated correlation on a scoreboard
END
HISTOGRAM scrboard
COUNT scrboard >= 0.41 hits
how often did random shuffling reach the target value?
DIVIDE hits 1000 prob
adjust for the number of repeat loops
PRINT prob
Results
Resampled correlation between price and quality
Outcome of five runs of 1,000 repeats:
prob = 0
prob = 0.001
prob = 0.004
prob = 0.001
prob = 0.002
Conclusion
After converting quantitative information into rankings, the correlation between price rank and quality was .41, which happens to be very close to the correlation between price and quality (.43). This suggests that the price-quality statistic was not influenced by a few extreme observations. We emulated the null hypothesis of no relationship between price and quality (the situation emulated in the RS program). Resampling simulation showed that the probability of getting a correlation at least as large as .41 was between .1% and .5%. We therefore reject the null hypothesis. The correlation was significant, with more expensive cosmetics tending to be of poorer quality.
References
Noreen, E.W. (1989). Computer intensive methods for testing hypotheses: An introduction. New York: Wiley.