CHAPTER II-5 THE PLACE AND ROLE OF BAYES' RULE AND BAYESIAN ANALYSIS Heated argument has raged about Bayes' rule for two centuries, involving fundamental questions about the possibilities and practices of statistics such as the disputed concept of inverse probability. This note attempts to bring some reconciliation and clarity to the disputes. No statistician doubts that Bayes' formula is the appropriate procedure for a variety of actual decision-making situations, including (1) business decisions such as whether to buy a used car in light of a judgment of the car's mechanical soundness by a skilled mechanic, and (2) analysis of medical screening and clinical decisions where there are both an a priori probability of a disease and a test for the individual patient. Disputes concern (1) the ability of Bayesian thinking to establish the cause of a set of observed observations (induction), and (2) the role of judgment in judging the validity of scientific hypotheses; these two latter issues are almost completely separate from each other and from the decision-making uses of Bayes' rule. The argument centers on whether Bayes' rule should be used mainly for such problems as those mentioned above, or whether it should underlie all statistical work. Savage calls for the use in all cases. He writes: [O]ne of the most striking symptoms of the inadequacy of statistical theory without subject probability is the lack of unity that such theory has had (1962, p. 13)...Fisher does not present us with a unified method (p. 14)...It has sometimes been contended that there are two different kinds of statistical theory, one appropriate to economic contexts and another to pure science; see for example, Fisher. In my own opinion, this dualistic view is incorrect (p. 15). Savage arrives at this judgment apparently on the grounds that such consistency is inherently better than a piecemeal approach of using a different approach for general scientific work and in specific decision situations; he does not, however, give a utilitarian argument for this judgment, despite his own emphasis on bringing expected utility into decisions.[1] Bayes' rule stands at the nexus of probability theory and statistics, and points both ways. This explains why it has been such a challenging idea for so long. I will start by discussing the basic mechanism of Bayes' rule as an intellectual tool. Then we will pass to the difficult issues. Bayes' Rule and Complex Problems in Conditional Probability To leading mathematicians writing about probability theory, Bayes' rule is just one particular mode of writing and using the idea of conditional probability. (See e. g. Feller, 1950; Goldberg, 1960). Indeed, Feller directs people away from using Bayes' formula and toward the simple formula for conditional probability p(A!B) = p(A,B)/p(B). "The beginner is advised always to [use the simple formula] and not to memorize [Bayes'] formula". He goes on to say that Bayes' formula "is a special way of writing [the simple conditional probability formula] and nothing more" (1950, p. 85). Feller is quite wary of Bayes' rule, perhaps because of what he considers unwarranted uses to which it has been applied, as we will see later. All but the simplest problems in conditional probability are confusing to the intuition even if not difficult mathematically. Indeed, a large proportion of mathematical-probabilistic puzzles are of this nature. To make clear the nature of the rule, I shall start with the simplest sort of problem that we can handle with first principles, and proceed gradually from there. Consider the probability of dealing the ace of spades from a poker deck of 52 cards. Clearly this "unconditional probability" is 1/52, deduced from our full knowledge of the mechanism that may be described by the sample space. We assume away the fact that there may be slight differences in probabilities for different cards due to different weights, and so on - an assumption which looms small here, but which is logically crucial in enabling us to create the closed system necessary for the application of probabilistic reasoning. Now assume that we seek the probability of dealing an ace if the ace of hearts has already been dealt. The deck now contains 51 cards, and clearly the probability is 3/51; we are deducing what is a conditional probability from our full knowledge of the sample space. All would agree that this is a problem in pure probability rather than statistical inference even though the estimate depends upon sample data - the observation of the ace of hearts - and hence the estimate may be considered an a posteriori probability. Now consider this question: You make a deck of one ace and one king, shuffle, deal a card, record whether ace or king, then replace and repeat. What is the probability that if one of the cards you pick is an ace, the other is an ace as well? I suggest you write down your answer before proceeding further. This problem was published in the Sunday newspaper supplement Parade in the following guise: A shopkeeper says she has two new baby beagles to show you, but she doesn't know whether they're male, female, or a pair. You tell her that you want only a male, and she telephones the fellow who's giving them a bath. "Is at least one a male?" she asks him. "Yes!" she informs you with a smile. What is the probability that the other one is [also] a male? (vos Savant,) The Parade columnist gave the answer as 1 in 3. The problem is so confusing that PhDs wrote her to say - with great confidence - that she was all wrong. The following simulation not only shows how to handle the problem expeditiously, but also shows why this sort of problem is so puzzling: Consider a two-digit column of random numbers in Table II-5- 1, using odd numbers for females and even for males. The first forty lines are sufficient to suggest the correct probability, and also to make clear the mechanism: Two-female pairs, a fourth of the cases, are excluded from the sample. And mixed pairs - which give a "no" answer - are two-thirds of the remaining pairs, whereas the only pairs that give "yes" answers - two males - are only a third of the remaining pairs. Simulation gets it right very quickly and easily, whereas the deductive method of mathematical logic results in much confusion. Table II-5-1 Now let us consider still another problem, a famous puzzler that comes from Bertrand. You face three chests with two drawers each. The drawers in chest A contain a gold coin each, one drawer in chest B contains a gold coin and one a silver coin, and the drawers in chest C each contain a silver coin. You choose a chest and a drawer randomly, and your probability of finding a gold coin is 1/2. You then find a silver coin. What is the probability that the coin in the other drawer of that same chest is gold? (Adapted from Schuh, 1968, p. 166). This situation is similar to the card-drawing case above in that we begin with a known sample space, make one observation without replacement, and ask about the probability of another observation. The only obvious difference is that in this case the subsequent sample space is a bit more complex - one must exclude chest A from the analysis because it does not contain a drawer with a silver coin. And this problem is solved with a process akin to Bayes' rule. (If you want to know the answer and how to do it with simulation, see Simon, 1994. Other puzzles are found in Chapter 00, including the famous "Monty Hall" or "three doors" problem.) Bayesian enthusiast Arnold Zellner writes: "Note that Bayes' inverse problem is fundamentally different from those encountered in games of chance, for example coin flipping, in which the probabilities of outcomes are known and the probabilities of various outcomes must be calculated" (1987, p. 208). Contrary to Zellner's view, it seems to me that Bertrand's problem has exactly the characteristics Zellner mentions, and also is a quintessential Bayesian problem (as Schuh notes, pp. 176-177); yet it is a pure problem in probability. Zellner goes on to say that gambling games are "problems in direct probability", to which he contrasts "Bayes' inverse probability problem, [wherein for example] five heads in six flips of a coin are observed and what must calculated or inferred is the chance that the probability of a head on a single flip lies in a given interval, say 0.4 to 0.7" (p. 208, italics in original). But the beagles problem and Bertrand's problem both seem to me quite in the tradition of direct probability, and are traditionally treated in texts on probability. A common and exceedingly valuable contemporary practical use of Bayes' rule is probability analyses of medical screening and diagnosis, commonly described in texts on biostatistics. These classic applications of Bayesian analysis can be modeled with urns in a standard probability context. For non- mathematicians the analysis is understood most easily with Venn (box) diagrams and probability trees; the formula is the least revealing mode for non-mathematicians. Let's consider for specificity an analysis of widespread screening for a type of cancer which we shall call C (example adapted from Wonnacott and Wonnacott). Assume C occurs in 3 of 1000 Americans. An early-warning screening test has been found to produce the following results: In patients without the cancer, the test shows 5 percent positive, 95 percent negative; in patients with the cancer, the test shows 98 percent positive, 2 percent negative. The central questions is: What proportion of the positive results will indicate that the patient actually has cancer? Applying the formula above, we get p (cancer given a positive test) = proportion of population that have cancer and show a positive result divided by the proportion of the population that show a positive result = (.98 * .003)/ (.98 * .003) + (.05 * .997) = (.0029)/(.0029 + .0499) = .056 Bayes' rule shows its value dramatically here because of the counter-intuitive nature of the results. Most laypersons guess that the answer is about 90 percent, and the 5.6 percent answer astonishes them. In many other such cases the probabilities that are calculated also are wildly different than one expects beforehand. Here is an example of the diagnosis of an individual patient (adopted from Roberts, p. 490): Based on preliminary examination, a doctor estimates the probability to be .005 that Patient F who suffered a blow on the head has a fractured skull. Experience suggests that an x-ray will give positive results in .96 of cases where there is a fracture, and .1 of the cases where is no fracture. If the doctor takes an x-ray and finds a positive result, what is the chance that the patient really has a fracture? The surprising answer is .046. Now an example in the diagnosis of whether a used car has a faulty transmission or not (adapted from W and W, p. 93): Prior data show that 30 percent have a fault. A mechanic labels 90 percent of faulty cars as "faulty", 10 percent as "okay"; he labels 20 percent of the OK cars "faulty", and 80 percent "okay". Bayesian analysis shows that the probability that the car will be faulty if he says "okay" is .05. So with the additional informa- tion one's surety of getting an okay car is raised from .7 to .95. A typical business example is a decision about whether to produce a new product, with the key factor being the demand for the product. The firm typically has a prior rough intuitive assessment of demand. Market research (which is costly) can provide improved estimates. One first calculates (using Bayesian analysis and backward induction) whether the expected benefit of the research is worth its cost, and then if the firm goes ahead with the research, Bayesian analysis is used to assess whether to go ahead with the new production. These everyday uses of Bayes' rule, and the modes of presentation of the concept in such basic texts as those of Feller and Goldberg, should make clear that Bayes' rule is a standard mathematical shortcut for investigating sample spaces and clarifying problems that challenge the intuition. This everyday quality hopefully demystifies it and makes it seem unlikely as a device for entering into the misty unknowns of ascertaining causes that cannot be identified in other ways. Elsewhere [Simon, 1996] I discuss why problems in Bayesian analysis are so hard to master intuitively. The difficulty lies in shifting from an analysis of the entire sample space to some part of it - from the entire population to one or more sub- groups. This requires a long chain of reasoning containing several switches in intuitive direction. This may also be seen in the examples that are worked later in this chapter. INVERSE PROBABILITY AND THE SEARCH FOR CAUSES The use of Bayes' rule as a device for identifying causes of observed phenomena was the original motive for Bayes' work, and later for Laplace's. Bayes' rule is still believed by some to be successful in that quest. For example, Dennis Lindley says that Bayes' "theorem must stand with Einstein's E=mc2 as one of the great, simple truths" (1987, p. 208), and Stephen Stigler calls it a "truly Copernican revolution in statistical concept" (1098, p. 122). Zellner (1987, p. 208) joins in giving Bayes "credit for solving the famous `inverse probability problem'". Other statisticians such as Ronald Fisher, and probabilists such as Feller (see above) believe (and I share their view) that Bayes' rule fails entirely in what it purports to do - ascertain causes and assign "inverse probability," though it certainly is a great and valuable idea for some applied purposes as discussed above. The underlying notion of inverse probability is that if one can examine a mechanism that generates outcomes of some sort and logically evaluate the probabilities of the various outcomes, it should be possible to examine a set of outcomes and logically assess the probabilities of various causes. Everyone understands that as a) your sample grows larger, or b) the dispersion grows smaller, you have less uncertainty about specified parameters. But this does not imply that one can by calculation identify causes among an unspecified set of possibil- ities and assign probabilities to them of being causes. Bayes' rule certainly seems to point to causes in some situations. The case of medical diagnosis is particularly clearcut: the combination of a prior probability of disease plus the result of a test on a particular patient (the tests have a high but not perfect record of being correct whether negative or positive) seems to be a probabilistic identification of a cause. But one is only choosing between two specified hypotheses - presence or absence of the disease - which enables one to have a closed sample space within which the problem is one of pure "direct" probability; this is quite different than what Bayes and Laplace had in mind, about which more later. The basic quest for a method of doing what Bayes and Laplace sought to do, other than in restricted situations, seems impossi- ble. I will first offer a few brief remarks pertaining to this conclusion, and then I will explore the matter at greater length with examples. Fisher rejected the concept of inverse probability - which for him is synonymous with induction - on a variety of grounds : 1) "apparent mathematical contradictions"; 2) "Its truth should be apparent to any rational mind", and it is not (certainly not to Fisher's mind); and 3) it "has been only very rarely used" in science (1966/1990, pp.6-7). He notes that a "well-defined distribution" is necessary a priori (1973/1990, p. 54), and when this does not exist, it commonly is brought in by "mathematical sleight-of-hand" (1966/1990, pp.198). Concerning Fisher's objection in the paragraph above that Bayes' rule is "rarely used" in science, it is relevant that such well-known books specifically devoted to Bayesian analysis as those of Press (1989) and Hartigan (1983) do not contain any real or even realistic examples of the use of the analysis for the most common sort of statistical analysis in social science and biostatistics: hypothesis testing or confidence intervals. Geweke (1988) labors mightily to show some actual uses in econometrics, the field that has expressed most interest in such use. But the difficulty of Geweke's examples rather than being the everyday sorts of tests, the need for complex computer algorithms, and the fact that the examples are not the sort of use of Bayes rule that one ordinarily imagines - a reasonably sharp prior distribution based on prior knowledge that is then modified in light of the sample evidence - all combine to make the case against common adoption of Bayes' rule in science rather than for it. Two other reasons why no statistical procedure can automatically identify a cause: 1) There always remains the possibility of a "hidden" or "third" variable which is related to the measured variables but that is not considered explicitly in the research. (See Simon and Burstein, 1985, for examples and discussion of this phenomenon.) 2) The larger scientific- statistical context, which must affect the interpretation of an given set of data, is exceedingly difficult ( and perhaps impos- sible) to capture within a Bayesian or other procedure. An important example is the number of sets of data that have been reviewed before the particular data set in question was analyzed, and the number of variables that were examined before seizing on the variables presently under investigation. This is the phenome- non known to statisticians as "data dredging". For example, if a hundred substances chosen at random are tested for an effect on a given disease, five may be expected to show a "significant" effect at the "95 percent significance level", even though none of the five will be likely to have an effect if tested again; it is for the same reason that so many random phenomena in our lives such as "premonitions" and coincidences seem meaningful to us. Feller points out that the manipulation of Bayes' rule has led into many peculiar byways. He tells us that "Plato used this type of argument to prove the existence of Atlantis, and philosophers used it to prove the absurdity of Newton's mechanics" (p. 85) The operational reason for rejecting the notion of inverse probability that I find most persuasive, however, is as follows: One can program a computer to start with a causal device such as a die or a pack of cards or a machine with given parameters that produces ball-bearings - all assumptions being explicit and not subject to argument - and it will produce the pattern of the expected results in such fashion that all will agree that the result is correct. And of course one can program a Bayesian analysis of a medical analysis. But computer modeling of inverse probability is not possible. One cannot begin with some pattern of observations, and - with only computations that are made directly from the computations - produce a set of alternative causes of the observations and find probabilities for those alternatives. The point is not that this cannot be done in practice; rather, it cannot be done in principle. And if one cannot even imagine a computer program of this sort, then any set of formulae that purports to accomplish this goal must contain some hidden assumptions that represent a kind of mumbo-jumbo. The above paragraph expresses in a different way a basic truth about statistical inference that is repeated throughout this book: Every question that is originally framed as a ques- tion in statistics or inverse probability eventually is answered by manipulating some known universe and analyzing the results in a manner which clearly is an exercise in direct probability. <1> Indeed, Stigler emphasizes that both Bayes and Laplace make implicit assumptions about prior distribution that are crucial to their analyses. But the implicit assumptions that they and others make are often so complex that their presence and opera- tion is obscured. Why, then, is it possible to use Bayes' rule to put proba- bilities on causes in the case of medical screening? Why is Bayes' rule useful in sorting out Bertrand's puzzle of the chests though the reasoning seems to work backwards (or inversely) from the observed evidence? The explanation (as suggested earlier) is that those cases constitute closed sample spaces. Whenever one has such a closed space - and any open sample space may be closed with judiciously chosen assumptions - then the analysis is simply an exercise in probability theory; one way or another, one asks about the behavior of a specified causal mechanism, rather than discovering the causes. This is not inverse probability but rather direct probability. Let us now consider in more detail, and with examples, the place of Bayes' rule in statistical inference. The context is the relationship of probability to testing hypotheses and estimating reliability. Consider the extreme case of drawing from an urn ten red balls in a row and nothing else, or ten slips of paper on which are written "13". Obviously the probability is high that the urn contains only reds or only "13's". But just how high is that probability? No one has discovered a sound way to assess this probability in the abstract, without additional information, and no one is likely to. The probability certainly is not 1.0 that the urn contains only reds or "13"s, because after pulling only two or one reds from the urn, one would certainly not assign a probability of 1.0 to the next ball being red. But no probabili- ty other than 1.0 makes sense, either. One popular formula has been p = n/(n+1), where n is the number of successful trials (assuming there have been no unsuccessful trials). But one would not assign a probability of .5 to a second ball being red after just one has been taken and found red. Feller tells us that this formula has led to many historical cases of nonsense conclusions. And the formula has no provision to allow for theoretical understanding, which is why the probability that the sun will come up next day would have been lower a thousand years ago than now, an example which Feller properly considers nonsensical. Another example in which the formula is obviously defective: Does the probability of your being alive the next day continue to increase indefinitely the longer you go on living? The formula applied in a theoretical vacuum implies that it does continue to increase, but I doubt that any person of 80 believes this to be true. This is one more case where a mathematical formula can be so bewitching as to override the obvious need for judgment to countervail the formula. If it is not possible to put a probability on the parameter in this case of no variation in the data (on the basis of the data alone), why should one hope that one can do better when there is dispersion in the results? Consider also this related example: You are told that you are to draw from one of two urns whose identities you do not know - urn A with 100 red balls, and urn B with 100 black balls. You draw and get a red ball. You will now predict with certainty that the next ball you draw from the same urn will be red. Furthermore, we say that you deduce the color of the next ball. Now change the facts and have the Urn A contain 99 reds and one black, and Urn B contain 99 blacks with one red. Again you draw from a randomly-chosen urn and obtain a red ball. You predict red for the next ball with almost as high a probability as when each urn had only red or black balls. And perhaps it is proper to say that you are deducing your probability estimate, unlike the case of an urn whose total contents you do not know. A key difference is that in this case the sample space can be considered closed, and we can therefore calculate a probability automatically by formula. But this case is not an analog to the search for causes, wherein it is never possible to specify a closed sample space. (Indeed, if the sample space could be con- sidered closed, there would be no place for the discovery of new knowledge, which is the essence of science!) Let us next consider the special case in the testing of hypotheses in which there are only two hypotheses, each held with equal probability. This situation can be analyzed with the Neyman-Pearson framework. But there does not seem to be any conceptual difference between the Neyman-Pearson approach and the Bayesian outlook with the two p's =.5. Indeed, the Neyman-Pear- son framework can be seen as an analysis of the likelihoods of the two hypothesized universes conditional upon the sample evi- dence. When we move to the more conventional hypothesis-testing situation - that which John Arbuthnott analyzed in 1718 (see Chapter 00) - we find a situation in which only the null hypothe- sis is fully specified (p of a male = .5, for Arbuthnott). Unlike the Neyman-Pearson case, the alternative hypothesis is not specified in this case. True, we may say that observing the evidence to be not consistent with the null hypothesis, and consequently rejecting it, is tantamount to accepting the alter- native hypothesis. But this does not mean that we are committed to the particular parameter that was observed (or any other) as being the "alternative" value, nor do we put a probability on an alternative magnitude; rather, we are simply committed to con- cluding that the parameter in which we are interested is not the null hypothetical value. It is this absence of fully-specified alternatives that makes it impossible to construct a closed sample space, and that distinguishes the core of statistical inference from problems in pure probability - at least in my view. Of course one may refer to the Neyman-Pearson analysis as statistical inference if one chooses, on the grounds that the conclusion is conditional upon the sample evidence (as is the case with the puzzles above). But using the term "statistical inference" in this context is a matter of personal taste in terminology, and should not obscure the fact that the grand aim of Bayes and Laplace cannot be realized. Consider the example of two sets of cholesterol measurements, one set for a group that received treatment T and the other for a group that received placebo P. One can infer whether it is likely that the T set came from the same universe as the P measurements by investigating the probability that the same universe would produce two samples as different as were observed, and one might even boldly speak about the probability that the two came from the same universe. But if one finds that the chance that the same universe would produce both sets is very small, one cannot assign a probability to the proposition that the observed sample parameter or any other parameter is the parameter of the T universe; the result of the hypothesis test - the probability that a single universe would produce both sets of data - casts no light on this proposition. One can speak about the probability (likelihood) that any particular universe would produce the observed T data, and even state that a universe with the sample parameter is the universe with the highest probability (maximum likelihood). But this relative statement is not at all the absolute probability that Bayes and Laplace sought. Exploring the diagram in Figure II-5-1 may help one under- stand why the original aim of Bayes and Laplace to calculate inverse probabilities in science usually is impossible. Imagine urns A, B, and C with 600, 300, and 100 tiny balls respectively, corresponding to initial Markov state probabilities dj of .6, .3, and .1. Each urn has pipes to urns W, X, Y, and Z. The transi- tional probabilities pjk that a ball from (say) A will go through its pipes to W, X, Y, and Z are .2, .4, .1., and .3 respectively. Assume that the balls are marked secretly with their origins. You examine a ball in W. What is the probability that it came from A? Figure II-5-1 Given the quantities of balls in A, B, and C (which correspond to the probabilities of being in the initial Markovian states), and the transition probabilities from A to W, X, Y, and Z (both sets of items given above), and using Bayes' rule, one can compute the probability sought. The explanation is far from intuitively easy, however. Let us first notice that we can deduce the quantities of balls that will have the outcomes of being in W, X, Y, and Z from the initial states and the transitional probabilities. For example, if p(W!A) = pAW = .1, the quantity of balls going from A to W equals .6 *.1 = .06. For each outcome urn the total is simply the sum of the transition probabilities to it from each initial-state urn times the quantity of balls in that urn. The outcome-state probabilities dk are the ratios of these quantities, summing to 1.0. Now let us ask about the "inverse probability" (a term that even Feller uses in this context; p. 341) that a given ball in W came from A; call it q(A!W) or qWA, in Feller's notation. This clearly is not the same as the probability that a ball from A will go to W; such a probability depends only on the transition probability p(W!A) which we can also write pAW. Nor is it the same as the probability p(W, A) that a ball (among all the balls in A, B, and C) will be first in A and then in W. The inverse probability q(A!W) = qWA, = (dA/dW)/p(W!A). From this equation we see that calculating the inverse probability requires knowing the probabilities of the initial states as well as the probabilities of the outcome states, or to put it differently, the probabilities of the initial states and all the transition probabilities(the probabilities of the outcome states being known if we know the initial-state probabilities and all the transition probabilities). If dW = .3, then qWA = (.6/.3) * .1 = .2, which is different than pAW which equals .06. If W were larger relative to X, Y, and Z - which could only be the case if pAW were larger than the assumed .1 (assuming A remains the same) - then pAW would be larger. So we see that for inverse probabilities to be calculated, the sizes of all initial and outcome states and the transition probabilities must be known - that is, a completely closed system. To repeat, without knowing the probabilities of the initial states, we cannot calculate inverse probabilities. But the probabilities of the initial states are seldom known in science, as I have already discussed; science seldom deals with closed systems except perhaps sometimes (maybe) in physics, chemistry and genetics. Hence the dream of calculating the probability of a given possible origin of a set of data from the data themselves is impossible or unreasonable in most situations. The absence of a closed sample space in an inferential situation also leads to the difficulties in the interpretation of the probabilistic results that constitute the hallmark of statistical inference. Again, in the absence of specified prior alternatives, it is not possible to put a probability on the likelihood that the observed sample came from a particular universe - the very aim of Bayes and Laplace which has been the holy grail for statisticians, and a source of frustration of beginning students of statistics. We must recognize that the search for a way to identify causes from data alone, without the introduction of judgments other than probability distributions, is just a mirage - though a mirage that we may continue to pursue because it involves some deep-seated psychological need. (The human yearning for a calculus that will assign probabilities to causes probably is so strong a desire that no arguments can eradicate it completely. Fisher speaks about "the desire felt for such probability statements" ([1973/1990, pp. 60- 61]) The argument in this section dovetails with the discussion of causality in Chapter 00 which emphasizes that a judgment of causality in any research situation cannot be made automatically from the observed correlations but must be made in the light of what is known theoretically. Now I switch directions and argue the great importance of Bayes' rule in statistical inference. Reasons have been adduced above to be convincing, I think, that Bayes' rule does not provide an automatic, judgment-free device to ascertain causes or to assign "inverse probability"; such a device is impossible in principle. But Bayes' and Laplace's work led to the corpus of our modern knowledge about statistical inference. While we cannot do quite what Bayes and Laplace hoped to do, statisticians have developed more modest concepts and devices that - when used with wisely-chosen auxiliary assumptions - provide quantitative assessments of the reliability of parameters and enable us to reach conclusions about whether hypotheses about sameness and difference of various treatments are reasonable. In every case these are conclusions built upon "direct" probabilities from ordinary probability models though they are embedded in the structure of an analysis which constitutes statistical inference. (Discussion of these concepts and devices may be found in Chapters 00.) In this sense it may be reasonable to apply terms such as "Copernican revolution". CONFIDENCE INTERVALS AND BAYESIAN ANALYSIS Unlike the search for the probabilities of causes, it seems to me that Bayesian thinking can often be valuable in constructing confidence intervals. If one states one's prior beliefs about the distribution of the parameter in question, and then combines that distribution with the observed data, there is nothing mysterious or ambiguous about stating the posterior distribution of belief, which can then be considered as the stuff of a confidence interval. Therefore, Bayesian analyis can serve well to shine clear sunlight on this murky concept. And even if one wishes to state an extremely "uninformative" prior distribution - that is, a state of affairs when one asserts close to no knowledge at all - the Bayesian procedure is admirably clear and consistent, pulling no rabbits from a hat. An illustration (using data from Box and Tiao) may be found in Chapter 00. One need not even do anything differently than standard confidence-interval calculations, to get the benefit of Bayesian analysis. One may simply interpret the results in the Bayesian fashion so as to obtain meaningful statements. BAYESIAN JUDGMENT IN SCIENCE. This section intertwines with the comments in Chapter I-5 about Bayesian analysis and the use of judgment in statistical inference. It is in scientific research rather than in decision-making analyses that the inclusion of one's prior probabilities is open to argument, and not on grounds of the theory of statistics but on grounds of wise scientific practice. The issue is not the appropriateness of subjective judgment in the statistical proc- ess; I believe that the arguments in Chapter I-5 are overwhelming that the use of judgment is unavoidable and ever-present. Rath- er, in my view the issue is whether there are some situations where it is most wise to conduct the analysis as if you are entirely agnostic about the likely outcome, on the grounds that proceeding with as open a mind as possible is the most efficient way to make scientific progress as well as to achieve fruitful interaction among researchers working in the general field. The assignment of prior probabilities is exactly the introduction of theoretical knowledge the absence of which was seen earlier to be fatal to the use of the n/(n+1) formula in the case where the trials have all had the same outcome. And no sensible person would argue that when a decision is made one should not use all the relevant knowledge. But this does not imply that theoretical knowledge should always be allowed to affect the conclusion drawn from a particular set of empirical data. One economist passes along a joking anecdote about a colleague who models microeconomic agents in his research as employing Bayesian analysis, but who himself forswears the technique in analyzing his results. "He models all his agents as knowing Bayesian decision theory even though he [himself] doesn't" (Rust, 1988, p. 146). And the writer then refers to this as "hypocrisy" (though not of a simple kind). This supposed inconsistency is given as an argument for the use of Bayesian analysis by the source of the anecdote. It is not the lack of a sense of humor, I think, that causes me to see no inconsistency in that sort of behavior. In the case of a person making an economic decision about the purchase of a used car, or a physician deciding how to treat a patient or whether to screen for a disease, Bayesian thinking may be quite appropriate; but for the scientist modeling or analyzing exactly the same behavior, Bayesian analysis may not be appropriate. The common argument against the use of Bayesian priors is that the scientist, consciously or unconsciously, may choose them in a fashion that will bolster the desired result - and it is most unlikely that the researcher does not have a desire to see one result rather than another. Additionally, once one has seen the empirical analysis apart from the Bayesian analysis, one needs a great deal of self-discipline to ensure that the priors should not be affected by the data, no matter how conscientious the researcher is in making "priors" prior to the analysis. The use of ingenuity may be able to surmount these difficulties at least in part - particularly the use of multiple priors in a sort of sensitivity analysis, and putting the priors into some sort of a "bank" prior to the empirical work. At the least, there are some situations where such safeguards could make Bayesian analysis worth doing in science. Yet I believe that proceeding without explicit priors will usually be the best practice because it separates the various parts of the research process in a useful way. An example from my own field may best make my argument: Consider the question of whether population growth has a negative effect upon economic development. Just about all empirical studies do not find a significant correlation between those two variables under a wide variety of conditions and specifications. A Bayesian might do explicitly what many enthusiasts for population control do implicitly - find so much theoretical reason to believe ex ante in the causal relationship that one will finally conclude a posteriori that a correlation exists and that it fails to show up empirically because of hidden variables working in the other direction or muddying the matter. But if one's theoretical belief is so strong ex ante, why bother to do the empirical research at all? Indeed, the greatest value of empirical research is when it contradicts strong beliefs based on a priori theorizing. And if the reported results of such work are influenced by the prior probabilities, it mitigates the power of the empirical research to cause reexamination of the theoretical framework. Attempting to keep prior knowledge of earlier work, and other sources of speculation about the results, as far as possible from the empirical work is in the same spirit as is the double-blind experiment in biostatistics, though there are also significant differences between the two ideas. The example at hand is of particular interest because some researchers and activists have argued that the finders of a zero correlation have an obligation to prove that their finding is not an artifact. But this is exactly the opposite of the basic thrust of science; the usual null hypothesis is to presume that there is no effect unless convincing evidence is shown in its favor - because between two randomly-chosen phenomena there is not likely to be a significant relationship. The better approach would be to consider the empirical findings meaningful and call for an examination of the prior beliefs, instead of letting the prior beliefs dominate the conclusions. The standard scientific and statistical practice of consid- ering no effect as the point of departure and the target for disproof - the null hypothesis - is sound doctrine, in my view. Society employs biomedical scientists to find treatments that have an effect rather than treatments that do not have an effect (though there are unusual cases where the aim is to find a sub- stance that will be neutral and have no effects); it is easy to find treatments that will not work. We pay chemists to find compounds that work, psychologists to find teaching methods that are effective, and so on. We want this causal knowledge because it helps us control our world. And finding this knowledge of an effect is difficult; it is tough to find important, valid, causal relationships, and there are relatively few of them, whereas finding examples of nothing happening is easy. And it is rela- tively easy to disprove the claim that something works, because one knows exactly which variable to examine; in contrast, it is difficult if not impossible to prove that there is no effect, because there are an infinite number of possible combinations of variables that one would have to examine. So it is reasonable that the burden of proof is on the claim that there is an effect, just as in Anglo-Saxon courts the burden of proof is on the claim that the defendant is guilty, for much the same technical reasons (though the moral situations are different). However, the basic presumption of a zero correlation is at odds with Bayesian practice, I understand it. In some branches of statistics - perhaps preeminently, econometrics - there is a large collection of safeguards against spurious correlations, and against invalidly interpreting a correlation as causal. But no one feels the need to provide safeguards against spurious non-correlations or against interpreting a non-correlation as non-causal. It would seem entirely in accord with the underlying judgmental Bayesian spirit to assert that there are some situations where one should make the judgment that it is sound practice to ignore prior theoretical and empirical knowledge. It may be illuminating to contrast the Bayesian search for "pure" scientific knowledge with the application of Bayes' rule in a business situation. A standard textbook example mentioned earlier is a decision to produce a new product where there is a prior probability distribution for (say) the demand for the product, and where market research can add information and increase knowledge about that demand; Bayes' rule produces a sharper posterior probability distribution than does the prior distribution and thereby presumably improves the quality of the decision-making. But such a decision-making situation is concerned with only the situation at hand, whereas a scientific study is concerned with the entire corpus of knowledge; this is perhaps as good a way as any to distinguish between "pure" and "applied" research. CONCLUSION The origin of Bayes' rule was in the search for causes - the process called inverse probability. With the passage of time, probability theory has made clear that Bayes' rule is appropriate for situations where the sample space can be entirely specified - that is, situations where one considers that such full specification is warranted. But this is not the case in most scientific research; usually there are many possible values of the population parameter of interest. Hence the search for a tool to automatically identify causes must necessarily fail except in situations where the alternatives can be specified (such as the situations Neyman-Pearson analysis deals with), and decision-making situations in which Bayes' rule is extremely helpful in finding counter-intuitive answers. The most controversial contemporary issue involving Bayes' rule concerns its applicability to inferential tasks of hypothesis testing and the estimation of reliability by combining theoretical knowledge with sample results. There is no doubt that research can never been wholly objective. But whether it is most wise to examine and analyze the data as separately as possible from theoretical background is best thought of as a matter of research taste, style, and wisdom. I believe that in most cases this process would be unwise. This matter is discussed in a slightly different context in Chapter 00 on causality, wherein it is argued that auxiliary theory is needed to distinguish a causal relationship from a correlation which has other claims to being called causal. Procedures for bayesian estimation in a variety of situations are presented in Chapter 00. **FOOTNOTES** [1]: Throughout this book I have wherever possible left aside issues of valuation, partly because I think that I can get on with much of the business at hand here without such consideration and partly because the complexity of the subject has increased greatly in past decades (which is much to the good) and the issues are now at a high point of unsettledness; perhaps this will long continue to be so, and for good reason. For background see Bell, Raiffa, and Tversky (1988), and especially the work there of Shafer. Shafer's point of view is in the same spirit as is this paper, because he suggests different procedures for different sorts of situations. NOTES **ENDNOTES** <1>:I would greatly enjoy debating this proposition with anyone who chooses to dispute it.