CHAPTER I-6 THE RELATIONSHIP BETWEEN PROBABILITY-MATHEMATICS AND STATISTICAL INFERENCE This chapter distinguishes between a) the theory of probability estimation and its practical estimation on the one hand, and b) inferential statistics on the other. Here is a brief statement of the relationship between the two bodies of knowledge: Inferential statistics equals probability models and calculations plus a set of rules for which probability model to use to fit particular situations, plus a set of principles of interpretation of the results of the manipulation of the probability model. The term "probability theory" refers to situations in which you know the nature of the system you are working with, and you wish to estimate the probability that the system will produce one or more particular events. For example, you can assume you know from the start the nature of a deck of bridge cards, and you want to estimate say the probability that such a deck with 13 spades among 52 cards will produce ten spades in the first thirteen cards dealt. In contrast, the term "inferential statistics" refers to situations in which you do not know the nature of the system you are dealing with, and you want to infer the nature of the system from the evidence in hand. For example, someone may deal 10 spades to you in the first 13 cards, and you -- not knowing what kind of deck it is -- want to estimate how likely it is that the deck has only 13 spades among the 52 cards, or that in fact it has a larger proportion of spades. To put it another way, in an inferential-statistics situation we want to characterize aspects of an unknown system; the mean and the median are examples of parameters that we wish to infer about an unknown system. In contrast, probability theory tells us about the probability of particular occurrences within systems whose parameters we already know. Probability theory clearly is relevant to situations such as gambling with cards or dice where the physical nature of the system is known. It is also relevant to such business situations as life insurance, where the overall probabilities of dying at each age are well known from a great deal of prior experience. (Business situations in which one does not know the structure of the situation but is prepared to assume the nature of the structure can similarly be dealt with using probability theory.) Inferential statistical thinking is particularly relevant for scientific investigations. In much of science the researcher tries to determine the nature of an unknown system from the evidence that s/he collects about it. It also is relevant to most decision-making. We may therefore define inferential statistics as the quantification of uncertainty. Whenever we are uncertain, and are willing to bring data to bear or otherwise quantify the simple probabilities, inferential statistics is relevant. Some writers distinguish between the role of chance in a) measurement and estimation, and b) theory. The end of the tunnel called "probability" tends to apply to theoretical uses - for example, in genetics, and in oligopoly theory in economics. The end of the tunnel we call "statistics" tends to apply to measurement and estimation - for example, discriminating between hypotheses, and putting reliability bounds on estimates. Quantum theory seems to be a mix of the two. A probability question asks: With system A, what might happen? A statistics question asks: What caused the results Z which did happen? You answer the statistics question by matching the results produced by known systems A, B, C...to the results Z to see which come closer and which are less close. Though in statistics we want to know if results Z were produced by system A, the operational question asks whether system A often does produce results like Z. The distinction between "was produced" and "does produce" is often very subtle. Problems referred to as "probability" in standard texts have these two steps in common: 1. Stipulate one or more universes (populations), which may be a generating mechanism such as a die or a population such as the residents of the United States. 2. Describe possible samples from the stipulated universe(s) in terms of their likelihoods. All problems in statistical inference also contain kernels of probability problems which include the above two steps. In addition, problems in statistics also include a third step: 3a. If a test of a hypothesis: Compare the observed data against the results of step 2 to see how frequently the observed sample or one that is even more "surprising" arises. 3b. If an investigation of confidence limits: Find the boundaries which partition the results in step 2 into one or two groups at the tail(s) of chosen size (say five percent) into the most surprising results and those which are less surprising. In addition, in problems of statistical inference the decision about which universe to stipulate in step 1 can be very complex because it is likely to be influenced by the purpose for which the work is being done as well as the scientific styles and tastes of the statistician and researcher. This implies that the choice of universe can seem arbitrary and hence is often controversial. Also, the decision about which comparisons to make between the observed sample and the probabilistic results in step 2 can be both complex and controversial in problems of inference. This is another way in which inferential work is less "objective" and "mechanical" than are problems in probability; that is, the calculations are more straightforward in problems that are only probabilistic rather than also inferential. The close connection between the two sorts of problems can be seen in the fact that the statistics question - What is the likelihood that this sample Z comes from a universe that has properties X and Y? - is answered with the same computation as the probability question: What is the likelihood that a universe with properties X and Y will produce a sample like Z? Here is another way or looking at the definition of "resampling": 1. Consider asking about the number of hits one would expect from a .250 (25 percent) batter in a 400 at-bat season. One would call this a problem in "probability". The answer can be calculated by formula or produced by Monte Carlo simulation. 2. Now consider examining the number of hits in a given batter's season, and asking how likely that number (or fewer) is to occur by chance if the batter's long-run batting average is .250. One would call this a problem in "statistics". Just as in (1) above, the answer can be calculated by formula or produced by Monte Carlo simulation. And the calculation or simulation is exactly the same as used in (1). Here the term "resampling" might be applied to the simulation with considerable agreement among people familiar with the term, but perhaps not by all. 3. Next consider an observed distribution of distances that a batter's hits travel in a season with 100 hits, with an observed mean of 150 feet. One might ask how likely it is that a sample of 10 hits drawn with replacement from the observed distribution (mean of 150 feet) would have a mean greater than 160 feet, and one could easily produce an answer with repeated Monte Carlo samples. Traditionally this would be called a problem in probability. 4. Next consider that a batter gets 10 hits with a mean of 160 feet, and one wishes to ask the probability that the sample would be produced by a distribution as specified in (3). This is a problem in statistics, and by now common practice would treat it with a bootstrap technique - called "resampling" by definition of all. The bootstrap simulation would, however, be identical to the work described in (3). Because the work in (4) and (2) differ only in the former being measured data and the latter being counted data, there seems no reason to discriminate between the two with respect to the term "resampling". With respect to the pairs of (1) and (2), and (3) and (4), there is no difference in the actual work performed, though there is a difference in the way the question is framed. I would therefore urge that the label "resampling" be applied to (1) and (3) as well as to (2) and (4), to bring out the important fact that the procedure is the same as in resampling questions in statistics. One could easily produce examples like (1) and (2) except that the drawing is without replacement, as in the sampling version of Fisher's permutation test - for example, a tea taster. And one could adduce the example of prices in different state liquor control systems (see Chapter III-1) which is similar to (3) and (4) except that sampling without replacement seems more appropriate. Again, the analogs to (2) and (4) would generally be called "resampling". As an example of the relationship between probability and statistics, consider the first published case of statistical inference, by John Arbuthnot. As we shall see in greater length in Chapter 00, Arbuthnot in 1712 observed that year after year the number of boys born in London was larger than the number of girls, and he wanted to know if he could properly infer from the sample that this is a natural law. He proceeded sensibly by considering how likely it would be to see a larger proportion of boys 82 years in a row if the probability is "really" .5 for each sex. The terms "probability" and "statistical inference" are used as labels in a great variety of ways and do not correspond to any neat differences. For example, the field known as "statistical mechanics" in physics has nothing to do with statistical inference or any other topic usually known as "statistics", but is exactly the sort of problem usually treated under the label "probability". And it is an open question whether the first use of the Normal distribution - in astronomy, to decide whether certain observations of stars should be considered flukes or not - should be considered probability or statistics, as it is also unclear whether forecasts based on statistical data - say, weather forecasts - are exercises in one or the other discipline. Here is an example of a case that is hard to classify: The Bureau of Standards sends out small quantities of test reagents to laboratories, along with a statement of the likelihood that the reagent is within certain boundaries of some property. The boundaries are derived from a sample of the main reservoir of the reagent. This plus-or-minus statement can be seen as a question asked about the probability that a known universe (but known from perhaps only 10 observations) will produce a specimen with a given mean and standard deviation. Or one can see the prediction as inferred from the sample evidence of the ten observations of the reservoir. Perhaps we should think of statistics as meta-probability, though some probability questions are not statistics. Or we can say that all statistics questions are probabilistic, but not all probability questions are statistical. Probability is a very easy subject compared to statistics. Probability is almost purely mathematical, in the sense that the probabilist usually need not worry about the purpose of the work, or the design of the study, but need simply answer the question as posed: the probability that the spaceship will hit the moon, or that the factory will leak pollution into Bhopal, or that three of four machines will cease to function today. Furthermore, the mathematics of probability is easy to teach, if one agrees that simulation is an admissible technique. This may be seen in how supposedly-challenging problems in probability are easy to do correctly with simulation. (See Chapter 00.) Statistics is the opposite of probability in those respects. Even more important, the issue of purpose is at the root of every problem in statistics. Should one do a one-tail test or a two- tail test? The answer must depend upon the purpose of the investigation. An example: Four black FBI agents were sitting together at a table in the restaurant, as were four white FBI agents, and the white agents were served in reasonable time but the black agents were not. The court sought to answer the question: Was there probably a pattern here, or would it likely happen by chance? It is easy to consider these data a standard probability problem turned around. But is it possible to say anything sensible about this event without knowing anything more about the context - how many seating patterns have been observed, and so on? I think not. In this respect, statistics is quite different than probability. And teaching or publishing only the simple mathematics of this problem profoundly misleads the reader, the court (if it is involved), or the student. Unlike probability, statistics is inevitably intertwined with - or is a handmaiden to - research methods, and issues like choice of design. For example, should one use a paired- comparisons setup or not? Some writers have considered probability problems to be deduction whereas statistical inference they call induction (see Chapter I-2 on this concept), on the grounds that the postulated universe is known in the former but inferred in the latter. For example, the likely behavior of a sample of incomes drawn from the U. S. population can be considered a deduction, and an estimate of the population's incomes from a sample considered induction. However, I consider both to be exercises in induction because they arrive at conclusions on the basis of insufficient knowledge; a probabilistic statement is made without the micro- knowledge of the random-selection process that would permit perfect selection. Only where all necessary information is available to draw a conclusion with certainty can one consider the activity to be deduction, in this view. For a non-statistical example, a prisoner escapes through cell bars that have been bent. Did the prisoner bend the bars with his arms? Sherlock Holmes might put together a set of clues that would allow him to draw a conclusion for sure - an act of deduction. But if we have imperfect knowledge of the prisoner's strength and of the strength necessary to bend the bars, we can only induce an imperfect conclusion. If we draw a sample of prisoners, and test their abilities to bend bars of the thickness of those in the escapee's cell, and if we have no reason to believe that the escapee was of other than average strength, we might say something about the probability that he bent the bars with his arms. This would seem to differ from Holmes' "deduction" only in its supposed certainty. (Incidentally, whether such a statement should be considered part of the study of "probability" or of "statistics" is unclear, illustrating the lack of clarity in the boundary between them.) The term "inverse probability" has caused so much confusion and controversy over the past two centuries that it may be wisest to forego using it. (For enlightening discussion of the topic, see Stigler, 1986) Just as every problem in statistics contains a kernel of a problem in probability (as in the Arbuthnot example earlier, which will be discussed at greater length in Chapter 00), just about every problem in probability can be imagined to have a dual problem in statistics. For example, a problem in probability may ask: What is the probability that if a firm reassigns managers in 30 cities by lottery (allowing managers to draw the cities they are now in), two or more managers will draw the cities that they are now in? One can turn around this situation and ask: The firm conducted its lottery and observed 7 matches out of 30. Is there any reason to believe that the lottery is fixed? The latter is a statistics problem which is handled by computing how likely such a result is to occur by chance, though as is the case with all statistical problems, more is involved in the interpretation than the probability calculation. Returning to the black and white FBI agents: First, one can see how this is a standard probability problem turned around. One would explore this question by asking: If there are four officers of each color in the universe and four are chosen randomly, what is the probability that the result will be four blacks (and therefore also four whites?) That is a pure question in probability. Second and interesting, is it possible to say anything sensible about this event without knowing anything more about the context - how many seating patterns have been observed, and so on? I think not. In this respect, statistics is quite different than probability. And teaching or publishing only the simple mathematics of this problem profoundly misleads the reader, the court (if it is involved), or the student. THE RELATIONSHIP OF PROBABILITY TO THE CONCEPT OF RESAMPLING There is no all-agreed definition of the concept of the resampling method in statistics. Unlike some other writers, I prefer to apply the term to problems in both pure probability and to those in statistics. This set of examples may illustrate: 1. Consider asking about the number of hits one would expect from a .250 (25 percent) batter in a 400 at-bat season. One would call this a problem in "probability". The answer can be calculated by formula or produced by Monte Carlo simulation. 2. Now consider examining the number of hits in a given batter's season, and asking how likely that number (or fewer) is to occur by chance if the batter's long-run batting average is .250. One would call this a problem in "statistics". But just as in example (1) above, the answer can be calculated by formula or produced by Monte Carlo simulation. And the calculation or simulation is exactly the same as used in (1). Here the term "resampling" might be applied to the simulation with considerable agreement among people familiar with the term, but perhaps not by all such persons. 3. Next consider an observed distribution of distances that a batter's hits travel in a season with 100 hits, with an observed mean of 150 feet per hit. One might ask how likely it is that a sample of 10 hits drawn with replacement from the observed distribution of hit lengths (with a mean of 150 feet) would have a mean greater than 160 feet, and one could easily produce an answer with repeated Monte Carlo samples. Traditionally this would be called a problem in probability. 4. Next consider that a batter gets 10 hits with a mean of 160 feet, and one wishes to estimate the probability that the sample would be produced by a distribution as specified in (3). This is a problem in statistics, and by 1996, common statistical practice would treat it with a bootstrap technique - called "resampling" by all. The actual bootstrap simulation would, however, be identical to the work described in (3). Because the work in (4) and (2) differ only in question (4) involving measured data and question (2) involving counted data, there seems no reason to discriminate between the two cases with respect to the term "resampling". With respect to the pairs of cases (1) and (2), and (3) and (4), there is no difference in the actual work performed, though there is a difference in the way the question is framed. I would therefore urge that the label "resampling" be applied to (1) and (3) as well as to (2) and (4), to bring out the important fact that the procedure is the same as in resampling questions in statistics. One could easily produce examples like (1) and (2) for cases that are similar except that the drawing is without replacement, as in the sampling version of Fisher's permutation test - for example, a tea taster. And one could adduce the example of prices in different state liquor control systems (see Chapter 8) which is similar to cases (3) and (4) except that sampling without replacement seems appropriate. Again, the analogs to cases (2) and (4) would generally be called "resampling". The concept of resampling is defined in a more precise way in Chapter 00. Fuller discussion may be found in Chapter 00.