CHAPTER I-1 WHERE STATISTICAL INFERENCE FITS INTO THE GETTING OF KNOWLEDGE Let's define statistical inference as: The process of selecting a probabilistic model to resemble the process you wish to investigate, investigating that model's behavior, and interpreting the results. Probabilistic statistical inference is a crucial part of the process of informing ourselves about the world around us. It helps us understand our world, and make sound decisions about how Until the 18th century, humanity's extensive knowledge of nature and technology was not based on formal probabilistic statistical inference. And animals survive without statistics. But now that we humans have already dealt with many of the big questions that are easy to answer without probabilistic statistics, and now that we live in a more ramified world than in earlier centuries, the methods of inferential statistics become ever more important. Furthermore, statistical inference will surely become ever more important in the future as we voyage into realms that are increasingly difficult to comprehend. The development of an accurate chronometer to tell time on sea voyages became a crucial need when Europeans sought to travel to the New World. Similarly, probability and statistical inference become crucial as we voyage out into space and down into the depths of the ocean and the earth, as well as probing into the secrets of the microcosm and of the human mind and soul. Where probabilistic statistical inference is employed, the inferential procedures may well not be the crucial element. For example, the wording of the questions asked in a public-opinion poll may be more critical than the inferential procedures used to discern the reliability of the poll results<1>. Yet we dare not disregard the role of the statistical procedures. SOME CONCEPTS AND ASSUMPTIONS Definition of "Knowledge" As used here the term "knowledge" excludes mystic and religious knowledge - that which is apprehended by meditation or other non-everyday sorts of thinking. (Whether "knowledge" is the appropriate term for this and subsequent excluded categories need not be discussed here.) I also exclude emotional understanding and other purely personal knowledge - that which you know about your own feelings but which another person cannot know except by your telling her/him about it. The term "knowledge" as used here refers only to material content in the "objective" realm - that is, statements about the world that two or more persons can discuss, concerning topics about which they can compare observations and beliefs. This includes such statements as whether or not it is raining today, whether HIV causes AIDS, and whether Mike Mussina's curve ball is more effective than Ben McDonald's. It is natural to consider defining knowledge for purposes here as that which pertains to decision-making. But such a definition has the defect that emotional reactions which are not objective knowledge - such as potential regrets at various outcomes of a decision - affect decisions, and properly so, and the definition needed here excludes those reactions. Though some writers would confine the definition and subsequent discussion to (say) prediction or explanation, the concept of knowledge in this book includes prediction, control, understanding, causal explanation, scientific law, and perhaps other types of interpretations of associations (correlations), too. This diverseness is crucial for a sound concept of knowledge both inside and outside of science, in my view. The concept of knowledge used here goes beyond scientific knowledge; it also includes knowledge used in business, government, family and personal life, and other human activities. There will be no discussion of how a person receives indi- vidual bits of information. That is, we shall not be concerned with the issues of sensation and perception which have received much attention from philosophers such as David Hume (1738; 1758), Bertrand Russell (1948) <2>, and Karl Popper (1979<3>. Rather, we begin discussion at the moment that the person already has in her/his possession information which is objective and public in the sense that it either is already known to others or could be communicated to others. An Overall Intellectual System? This discussion of inference is not founded on any set of philosophical "first principles" from which everything else purports to be deduced. But the discussion is not incompatible with most such world views. If your thinking rests on some particular structure of first principles, it is likely that most or all of what is said here will co-exist easily with your conceptual system. That is because in every practical craft one generally begins with the immediate facts at hand and proceeds with little recourse to more general principles. And this is good practice; one needs to make recourse to more general knowledge only for the more difficult and confusing problems that one faces only occasionally. (Inquiry into the subtleties of whether an observed relationship should be considered causal is a good example of such infrequent difficult cases). And it is a happy circumstance that it is so, because it enables people with very different world views to cooperate successfully and to make use of the products of each other's work. Of course one is best equipped for any occupation if one is acquainted with more general knowledge than most daily situations demand; you then are prepared to handle the atypical puzzling situations. But even the more general knowledge one may need usually is not a great many layers up the hierarchy of knowledge. And in my experience and reading, a decision about statistical practice almost never depends on anything that might be close to "first principles". This viewpoint is connected in a fundamental way to the Kant-Einstein-Bohr view of theories as being a product of the human imagination rather than being a property of nature, and of the Bohr-Godel-Heisenberg view of knowledge as being necessarily open and incomplete. <4> Asserting that there are no "natural" first principles dispensed or dictated to us from on high, but rather that the assumptions for any given system must be chosen by us - as in this view - does not imply that the choice of basic principles for a given activity is a matter of indifference, or that it does not matter where within the structure of knowledge you begin. Just the opposite: A wise choice of assumptions for a given inquiry or activity is all-important. But the choice must be a matter of judgment, and it can be implicit. The judg- ment must depend on your aims, the data, the existing knowledge, and your own nature. Making wise judgments about assumed under- lying principles and other working assumptions is one of the highest skills a researcher and a decisionmaker can possess. (See Chapter 00 on the role of judgment.) In my view the practical procedures of science, together with necessary general discussion of the hard cases, constitute a body of operational philosophy that is sufficient in itself, and may be preferable to any philosophical discussion of science that begins with a set of first principles. To avoid confusion, the term "operational philosophy" as used in the above paragraph has no connection to operationalism as a way of thinking; it simply means that what I write is intended to help people get their work done, rather than be merely speculative. And my frequent use of the crucial concept of the operational definition, though it is a part of operationalism, does not at all mean that I subscribe to the more general prescriptions of operationalism<5>. KNOWLEDGE WITHOUT PROBABILISTIC STATISTICAL INFERENCE Let us distinguish two kinds of knowledge with which inference at large (that is, not just probabilistic statistical inference) is mainly concerned: a) one or more absolute measurements on one or more dimensions of a collection of one or more items - for example, your income, or the mean income of the people in your country; and b) comparative measurements and evaluations of two or more collections of items, and especially, deciding whether they are equal or unequal - for example, the mean income in Brazil compared to the mean income in Argentina. Types (a) and (b) both include asking whether there has been a change between one observation and another. What is the conceptual basis of our getting these sorts of knowledge about the world? I believe that our rock bottom conceptual tool (used without even our thinking about using it) is the assumption that we may call sameness, or continuity, or constancy, or repetition, or equality, or persistence; "constancy" and "continuity" will be the terms used most frequently here, and I shall use them interchangeably. In physics this concept is call structure (Wigner, 1979, p. 29ff), and it is the precondition for the existence of laws of nature. (Structure in the laws of nature themselves is called invariance).<6> Continuity is a non-statistical concept. It is a best guess about the next point beyond the known observations, without any idea of the accuracy of the estimate. It is like testing the ground ahead when walking in a marsh. It is local rather than global. We'll talk a bit later about why continuity seems to be present in much of the world that we encounter. The other great concept in statistical inference, and perhaps in all inference taken together, is representative (usually random) sampling, to be discussed in Chapter II-OO. Representative sampling - which depends upon the assumption of sameness (homogeneity) throughout the universe to be investigated - is quite different than continuity; representative sampling assumes that there is no greater chance of a connection between any two elements that might be drawn into the sample than between any other two elements; the order of drawing is immaterial. In contrast, continuity assumes that there is a greater chance of connection between two contiguous elements than between either one of the elements and any of the many other elements that is not contiguous to either. Indeed, the process of randomizing is a device for doing away with continuity and autocorrelation within some bounded closed system - the sample "frame". It is an attempt to map (describe) the entire area ahead using the device of the systematic survey. Random representative sampling enables us to make probabilistic inferences about a population based on the evidence of a sample. To return now to the concept of sameness: Examples of the principle are that we assume: a) Our house will be in the same place tomorrow as today. b) A hammer will break an egg every time you hit the latter with the former (or even the former with the latter). c) If you observe that the first fifteen persons you see walking out of a door at the airport are male, the sixteenth probably will be male also. d) Paths in the village stay much the same through a person's life. e) Religious ritual changes little through the decades. f) Your best guess about tomorrow's temperature or stock price is that will be the same as today's. This principle of constancy is related to David Hume's concept of constant conjunction. The principle of sameness does not reject Heraclitus' observation (about 500 BC) that one never steps into the same river twice. True, the molecules flow by. But the river stays the same on the map, and in the minds of animals, and in contracts for water rights. And the homeowner on the river's banks had better treat the river's presence as continuing or court disaster. It is this practical sameness with respect to some decision or another that matters, Heraclitus notwithstand- ing. When my children were young, I would point to a tree on our lawn and ask: "Do you think that tree will be there tomorrow?" And when they would answer "Yes", I'd ask, "Why doesn't the tree fall?" That's a tough question to answer. There are two reasonable bases for predicting that the tree will be standing tomorrow. First and most compelling for most of us is that almost all trees continue standing from day to day, and this particular one has never fallen; hence, what has been in the past is likely to continue. This assessment requires no scientific knowledge of trees, yet it is a very functional way to approach most questions concerning the trees - such as whether to hang a clothesline from it, or whether to worry that it will fall on the house tonight. That is, we can predict the outcome in this case with very high likelihood of being correct even though we do not utilize anything that would be called either science or statistical inference. (But what do you reply when your child says: "Why should I wear a seat belt? I've never been in an accident"?) A second possible basis for prediction that the tree will be standing is scientific analysis of the tree's roots - how the tree's weight is distributed, its sickness or health, and so on. Let's put aside this sort of scientific-engineering analysis for now. The first basis for predicting that the tree will be standing tomorrow - sameness - is the most important heuristic device in all of knowledge-getting. It is often a weak heuristic; certainly the prediction about the tree would be better grounded (!) after a skilled forester examines the tree. But persistence alone might be a better heuristic in a particular case than an engineering-scientific analysis alone. This heuristic appears more obvious if the child - or the adult - were to respond to the question about the tree with another question: Why should I expect it to fall? In the absence of some reason to expect change, it is quite reasonable to expect no change. And the child's new question does not duck the central question we have asked about the tree, any more than one ducks a probability estimate by estimating the complementary probability (that is, unity minus the probability sought); indeed, this is a very sound strategy in many situations. Constancy can refer to location, time, relationship to another variable, or yet another dimension. Constancy may also be cyclical. Some cyclical changes can be charted or mapped with relative certainty - for example the life-cycles of persons, plants, and animals; the diurnal cycle of dark and light; and the yearly cycle of seasons. The courses of some diseases can also be charted. Hence these kinds of knowledge have long been known well. Consider driving along a road. One can predict that the price of the next gasoline station will be within a few cents of the gasoline station that you just passed. But as you drive further and further, the dispersion increases as you cross state lines and taxes differ. This illustrates continuity. Constancy can also be transformational. Some transforma- tions have sufficiently little uncertainty that people have understood them for ages - for example, cooking, brewing, smelt- ing, some medical practices such as setting bones and delivering babies, and various crafts. The attention to constancy can focus on a single event, such as leaves of similar shape appearing on the same plant. Or attention can focus on single sequences of "production", as in the process by which a seed produces a tree. For example, let's say you see two puppies - one that looks like a low-slung dachshund, and the other a huge mastiff. You also see two grown male dogs, also apparently dachshund and mastiff. If asked about the parentage of the small ones, you are likely - using the principle of sameness - to point - quickly and with surety - to the big dogs of the same breed. (Here it is important to notice that this answer implicitly assumes that the fathers of the puppies are among these dogs. But the fathers might be somewhere else entirely; it is in these ways that the principle of sameness can lead you astray.) When applying the concept of sameness, the object of interest may be collections of data, as in Semmelweiss's data on the consistent differences in rates of maternal deaths from childbed fever in two clinics with different conditions (see Table 11-1), or the similarities in sex ratios from year to year in Graunt's data on London births (Table 11-2), or the stark effect in John Snow's data on the numbers of cholera cases associated with two London wells (Table 11-3), or the reduction in beriberi among Japanese sailors as a result of a change in diet (Table 11-4). These data seem so overwhelmingly clear cut that our naive statistical sense makes the relationships seem deterministic, and the conclusions seems straightforward.<7> (But the same statistical sense frequently misleads us when considering sports and stock market data.) Table 1 [Semmelweiss Table 1 p. 64] Table 2 [see in Hald or Graunt] Table 3 [Winslow p. 276] Table 4 [Table 1-1 in Kornberg, 1989] Constancy and sameness can be seen in macro structures; consider, for example, the constant location of your house. Constancy can also be seen in micro aggregations - for example, the raindrops and rain that account for the predictably fluctuating height of the Nile, or the ratio of boys to girls born in London, cases in which we can average to see the "statistical" sameness. The total sum of the raindrops produces the level of a reservoir or a river from year to year, and the sum of the behaviors of collections of persons causes the birth rates in the various years. This micro view of constancy points toward the notion of the Central Limit Theorem that will be discussed later. How does a person discover that there is a pattern of sameness? This is a psychological issue - the question of concept formation (which I take to be the functional equivalent of the philosopher's concept of induction).<8> Statistical inference is only needed when a person thinks that s/he might have found a pattern but the pattern is not completely obvious to all. Probabilistic inference works to test - either to confirm or disconfirm - the belief in the pattern's existence. We will see such cases in the following chapter. Please notice that no justification has yet been given for predicting on the basis of observed constancy, or for acting in reliance on that prediction. Nor have I referred to this process with the label "induction", a label which connotes to most philosophers a more general process than I am describing here. (Finding a logical justification for induction has been one of the great searches in the history of philosophy. But it has been entirely unsuccessful. And as one of its greatest practitioners - Bertrand Russell - eventually concluded, the search must be unsuccessful<9>. Even trying to explain logically why I think that the search must fail is not worth the time, I think. And the search for justification usually refers to induction, and that in turn usually refers to producing theories, according to Popper (1979, Chapter 1). This is quite different than the assumption of sameness.) Lack of logical justification does not imply that the proc- ess of predicting or other generalization on the basis of ob- served constancy is without foundation, or is somehow illegimate, or should not be done. The most satisfactory argument for acting on sameness - if any such argument is needed, and I think that it is not - is the thorough-going success of the process. We may call this a pragmatic argument, in the tradition of Charles Pierce and William James. That is, anyone who decided not to proceed on the basis of past experience, and actually tried to so act, would find life either bewildering or impossible or both. <10> One can wonder why proceeding on the assumption of continuity succeeds, of course. I suggest these two answers to the question: 1) If there were not considerable continuity, we would not be here day after day, so we live in a world of continuity by "anthropic principle"? (Barrow and Tipler, 1988). 2) Here is a cosmological argument by analogy: Imagine a ball of homogeneous yet chemically and physically malleable material entering a big incinerator. Various things happen. Perhaps one side is flattened like a penny on a railroad track. Other mate- rials are pressed into it. The ball is heated, and then heat escapes faster from the surface than from the inside. It is jostled by other objects of different kinds. And so forth. The result? Something like our earth, created not by design or equa- tion but by "chance" and evolution. It starts homogeneous, and ends up non-homogeneous. Just as it applies to cases of perfect historical constancy and certainty (for example, that the sun will come up tomorrow) this second argument also applies to predicting on the basis of slightly incomplete continuity and minor uncertainty (for example, that you will recover from a cold after a week or so). Actually, there are many cloudy days when you cannot verify that the sun has come up except indirectly, but let us ignore that distracting matter as we will ignore many matters that seem to be loose ends in the discussion. This is in keeping with the attitude that logical completeness is not our goal because it cannot be attained except in discussions of logic itself - and not always there, either, as Godel taught us.[<11> People have always been forced to think about and act in situations that have not been constant - that is, situations where the amount of variability in the phenomenon makes it impossible to draw clear cut, sensible conclusions. For example, the appearance of game animals in given places and at given times has always been uncertain to hunters, and therefore it has always been difficult to know which target to hunt in which place at what time. And of course variability of the weather has always made it a very uncertain element. The behavior of one's enemies and friends also has always been uncertain, too, though uncertain in a manner different from the behavior of wild animals; there often is a gaming element in interactions with other humans<12>. But in earlier times, data and techniques did not exist to enable us to bring statistical inference to bear. It should be noted that though human behavior may be diffi- cult to predict in some circumstances, much human behavior is extraordinarily predictable. Movies start exactly when the newspaper says they will. If the price of gasoline is $1.09 at one station, it will be within cents of that price at a station across the street. If you put a $5 bill on the busy sidewalk, it will be gone in minutes, as David Hume advised us. A man who at noon leaves his purse full of gold on the pavement at Charing-Cross, may as well expect that it will fly away like a feather, as that he will find it untouched an hour after (Essays, p. 99, Open Court ed) The hoary comment that social science is somehow "softer" than is "hard" physical and biological science because human behavior is less predictable is simply a prejudice, arising most probably from a) people not noticing the huge amount of highly predictable knowledge of human behavior that we all possess, and classifying it outside social science, and b) the particular difficulty of the challenging problems in social science arising from their variability rather than (as in the physical and biological sciences) from being unable to observe the phenomenons that we wish to learn about. [<13> In medicine and healing (and in all the biological and physical sciences), the challenging research problems involve phenomena one cannot see with the naked eye - bacteria, viruses, how fleas behave and the killer germs they carry; this invisibility is the source of the uncertainty in these sciences. These hard problems are unlike the medical problems of broken bones, bleeding, and giving birth, about which healers have had extensive knowledge for millenia. In social science, by contrast, the difficulty of making major discoveries stems from our having long ago learned the easy-to-establish knowledge with our millenia of extensive day- to-day social experience. Thus we have skimmed off the easy research problems in the social sciences. THE TREATMENT OF UNCERTAINTY The purpose of statistical inference is to help us peer through the veil of variability when it obscures the main thrust of the data, so as to improve the decisions we make. Statistical inference (or in most cases, simply probabilistic estimation) can help a) a gambler deciding on the appropriate odds in a betting game when there seems to be little or no difference between two or more outcomes; b) an astronomer deciding upon one or another value as the central estimate for the location of a star when there is considerable variation in the observations s/he has made of the star; c) a basketball coach pondering whether to take from the game her best shooter who has heretofore done poorly tonight; d) an oil-drilling firm debating whether to follow up a test-well drilling with a full-bore drilling when the probability of success is not overwhelming but the payoff to a gusher could be large. Until the 18th or even 19th century the canon of science did not embody uncertainty, and it proceeded on the basis of constan- cy alone. For example, J.S. Mill's famous experimental procedures to establish causality do not introduce uncertainty. [cite] The key idea underlying his experimental canon was to hold everything else the same, with certainty. Before we proceed to consider situations where we must grap- ple with uncertainty, let us repeat: In most walks of life one seldom needs statistical inference. Most matters, even in science, are well understood without using statistical inference to grapple with the uncertainty because (as noted above) most matters are highly stable and predictable - the positions of your desk, rug, and house; the arrival of your newspaper in the morning, the availability of your favorite foods in the store, the fate of a $5 bill left on the sidewalk. Returning to the tree near the Simon house: Let's change the facts. Assume now that one major part of the tree is mostly dead, and we expect a big winter storm tonight. What is the danger that the tree will fall on the house? Should we spend $1500 to have the mostly-dead third of it cut down? We know that last year a good many trees fell on houses in the neighborhood during such a storm. We can gather some data on the proportion of old trees this size that fell on houses - about 5 in 100, so far as we can tell. Now it is no longer an open-and-shut case about whether the tree will be standing tomorrow, and we are using statistical inference to help us with our thinking. We proceed to find a set of trees that we consider like this one, and study the variation in the outcomes of such trees. So far we have estimated that the average for this group of trees - the mean (proportion) that fell in the last big storm - is 5 percent. Averages are much more "stable" - that is, more similar to each other - than are individual cases. The tree populations illustrate. Notice how we use the crucial concept of sameness: We assume that our tree is like the others we observed, or at least that it is not systematically different from most of them and it is more-or-less average. How would our thinking be different if our data were that one tree in 10 had fallen instead of 5 in 100? This is a question in statistical inference. How about if we investigate further and find that 4 of 40 elms fell, but only one of 60 oaks, and ours is an oak tree. Should we consider that oaks and elms have different chances of falling? Proceeding a bit further, we can think of the question as: Should we or should we not consider oaks and elms as different? This is the type of statistical inference called "hypothesis testing": we apply statistical procedures to help us decide whether to treat the two classes of trees as the same or different[1]. If we should consider them the same, our worries about the tree falling are greater than if we consider them different with respect to the chance of damage. Notice that statistical inference was not necessary for accurate prediction when I asked the kids about the likelihood of a live tree falling on a day when there would be no storm. So it is with most situations we encounter. But when the assumption of constancy becomes shaky for one reason or another, as with the sick tree falling in a storm, we need a more refined form of thinking. We collect data on a large number of instances, inquire into whether the instances in which we are interested (our tree and the chance of it falling) are representative - that is, whether it resembles what we would get if we drew a sample randomly - and we then investigate the behavior of this large class of instances to see what light it throws on the instances(s) in which we are interested. The procedure in this case - which we shall discuss in greater detail later on - is to ask: If oaks and elms are not different, how likely is it that only one of 60 oaks would fall whereas 4 of 40 elms would fall? Again, notice the assumption that our tree is "representative" of the other trees about which we have information - that it is not systematically different from most of them, but rather that it is more-or-less average. Our tree certainly was not chosen randomly from the set of trees we are considering. But for purposes of our analysis, we proceed as if it had been chosen randomly - because we deem it "representative". This is the first of two roles that the concept of randomness plays in statistical thinking. Here is an example of the second use of the concept of randomness: We conduct an experiment - plant elm and oak trees at randomly-selected locations on a plot of land, and then try to blow them down with a wind-making machine. (The random selection of planting spots is important because some locations on a plot of ground have different growing characteristics than do others.) Some purists object that only this sort of experimental sampling is a valid subject of statistical inference; it can never be appropriate, they say, to simply assume on the basis of other knowledge that the tree is representative. I regard that purist view as a helpful discipline on our thinking. But accepting its conclusion - that one should not apply statistical inference except to randomly-drawn or randomly-constituted samples - would take from us a tool that has proven useful in a variety of activities. As discussed earlier in this chapter, the data in some (probably most) scientific situations are so overwhelming that one can proceed without probabilistic inference. Historical examples include those shown above of Semmelweiss and puerperal fever, and John Snow and cholera. But where there was lack of overwhelming evidence, the causation of many diseases long remained unclear for lack of statistical procedures. This led to superstitious beliefs and counter-productive behavior, such as quarantines against plague often were. Some effective practices also arose despite the lack of sound theory, however - the waxed costumes of doctors, and the burning of mattresses, despite the wrong theory about the causation of plague; see Cipolla, 1981) So far I have spoken only of predictability and not of other elements of statistical knowledge such as understanding and control. This is simply because statistical correlation is the bed rock of most scientific understanding, and predictability. Later we will expand the discussion beyond predictability; it holds no sacred place here. WHERE STATISTICAL INFERENCE BECOMES CRUCIAL There was little role for statistical inference to play up until about three centuries ago because there existed very few scientific data. For example, as late as the 1700s there were so few data about population size that there was a major controversy between David Hume and Charles Montesquieu (Hume 1741-1777/1985, pp. 377-464) about whether the population of the ancient world had increased or decreased since then. <14> When scientific data began to appear, the need emerged for statistical inference to improve the interpretation of the data. As we saw, statistical inference is not needed when the evidence is overwhelming. A thousand cholera cases at one well and zero at another obviously does not require a statistical test. Neither would 999 cases to one, or even 700 cases to 300, because our inbred and learned statistical senses can detect that the two situations are different. But probabilistic inference comes to be needed when the number of cases is relatively small or where for other reasons the data are somewhat ambiguous. For example, when working with the 17th century data on births and deaths, John Graunt - great statistician though he was - drew wrong conclusions about some matters because he lacked modern knowledge of statistical inference. For example, he found that in the rural parish of Romsey "there were born 15 Females for 16 Males, whereas in London there were 13 for 14, which shews, that London is somewhat more apt to produce Males, then the country" (p. 71). He suggests that the "curious" inquire into the causes of this phenomenon, apparently not recognizing - and at that time he had no way to test - that the difference might be due solely to chance. He also notices (p. 94) that the variations in deaths among years in Romsey were greater than in London, and he attempted to explain this apparent fact (which is just a statistical artifact) rather than understanding that this is almost inevitable because Romsey is so much smaller than London. Because we have available to us the modern understanding of variability, we can now reach sound conclusions on these matters <15>. Another example in Graunt's work: He hypothesized that fertility was lower than otherwise during the more "sickly" years in 17th century England. He tried to test this hypothesis on the London Bills of Mortality. (At this point the reader might examine whether the data in Table 11-5 permit you to reach a clear cut judgment about this hypothesis.) Graunt arrived at a wrong conclusion - that the data supported his hypothesis - because he did not have available a general device for handling all the data at once, but instead he had to resort to examining local maxima and minima (see Hald, 1990, pp. 93 ff.) and therefore he was unable to reach a sound general conclusion. But with modern ideas of correlation and the test of a correlation's statistical significance - which enable one to treat a collection of data all at once with summarizing statistics - he could easily have arrived at a sound answer. More generally, summary statistics - such as the simple mean - are devices for reducing a large mass of data (inevitably confusing unless they are absolutely clear cut) to something one can manage to understand. And probabilistic inference is a device for determining whether patterns should be considered as facts or artifacts.<16> Table 11-5 Here is another example that illustrates the state of early quantitative research in medicine: Exploring the effect of a common medicinal substance, Boecker examined the effect of sasparilla on the nitrogenous and other constituents of the urine. An individual receiving a controlled diet was given a decoction of sasparilla for a period of twelve days, and the volume of urine passed daily was carefully measured. For a further twelve days that same individual, on the same diet, was given only distilled water, and the daily quantity of urine was again determined. The first series of researches gave the following figures (in cubic centimeters): 1,467, 1,744, 1,665, 1,220, 1,161, 1,369;, 1,675, 2,199, 887, 1,634, 943, and 2,093 (mean = 1,499); the second ser- ies: 1,263, 1,740, 1,538, 1,526, 1,387, 1,422, 1,754, 1,320, 1,809, 2,139, 1,574, and 1,114 (mean = 1,549). Much uncertainty surrounded the exactitude of these measurements, but this played little role in the ensu- ing discussion. The fundamental issue was not the quality of the experimental data but how inferences were drawn from those data (Coleman in Kruger, 1987, p 207). The experimenter Boecker had no reliable way of judging whether the data for the two groups were or were not meaningfully different, and therefore he arrived at the unsound conclusion that there was indeed a difference. (Gustav Radicke used this example as the basis for early work on statistical significance.) Another example: Joseph Lister convinced the scientific world of the germ theory of infection, and the possibility of preventing death with a disinfectant, with these data: Prior to the use of antiseptics - 16 post-operative deaths in 35 amputations; subsequent to the use of antiseptics - 6 deaths in 40 amputations (Winslow, 1943, p. 303). But how sure could one be that a difference of that size might not occur just by chance? No one then could say, nor did anyone inquire, apparently. Here's another example of great scientists falling into error because of a too-primitive approach to data (Feller, 3rd ed, 1968, pp. 69-70): Charles Darwin wanted to compare two sets of measured data, each containing 13 observations. At Darwin's request, Francis Galton compared the two sets of data by ranking each, and then comparing them pairwise. "The a's were ahead 13 times. Without knowledge of the actual probabilities Galton concluded that the treatment was effective. But, assuming perfect randomness, the probability that the a's beat [the others] 13 times or more equals 3/16. This means that in three out of sixteen cases a perfectly ineffectual treatment would appear as good or better than the treatment classified as effective by Galton." That is, Galton and Darwin reached an unsound conclusion. As Feller says, "This shows that a quantitative analysis may be a valuable supplement to our rather shaky intuition" (p. 70). Looking ahead, the key tool in situations like Graunt's and Boecker's and Lister's is creating ceteris paribus - making "everything else the same" - with random selection in experiments, or at least with statistical controls in non- experimental situations. The insurance industry early on made large strides toward statistical inference. Insurance itself enables us to deal with the uncertainties of shipwreck, fire, life and death [ed: no comma after "life"]. But the central principle of equality in insurance analysis, could be assumed among given persons, voyages, and fire risk - an average length of life, for example - with needing to test for equality. So insurance could do without modern probabilistic inference. There is a long and colorful history of dealing with uncertainty in gambling; examples of dice-like devices date back to Egyptian times. But probabilistic knowledge about gambling odds was more lore than scientific understanding. And the problems were not mainly those of statistical inference. In astronomy, over the millenia scientists were able to make great progress without statistical inference because the outcomes of most studies are well-determined; there was little doubt that Saturn was Saturn to all observers, and the sun appears every clear day in most latitudes. But when there are differences among astronomers in observations of the location of planets and stars, and outliers among the observations made by a single person, the treatment of the ensuing uncertainty requires modern statistical inference; this need was one of the major sources of the development of statistical inference. At first astronomers did not even use the concept of the mean (Hald, 1990). Then need led to their inventing that concept. Later a new question arose about whether variation is random. All this was major progress toward the statistical understanding we now possess. In medical research, there was pressing need to know how to decide whether data collections should be considered the same or different when the results of comparisons are not as overwhelming as in Semmelweiss's study of puerperal fever or Takaki's's study of beriberi (Table 4). This can be considered a question about continuity - whether two collections of data on health outcomes of people who have had different environments or therapies should be considered as continuous with each other, either in the sense of being the same or in the sense of there not being a sharp break between them. When there is considerable overlap and non- dominance in results among groups, one needs statistical infer- ence to reach valid conclusions. One compare samples with the help of the concept of random sampling. This concept enables us to compare collections that are not manifestly equal, as in simple physical experiments illustrated with Boecker's sasparilla data and Lister's disinfectant data above. The concept of comparison - that is, whether things should be considered as equal or not (even before quantitative compari- son) - is fundamental in the above paragraph (as David Hume told us it is fundamental in all knowledge; "All kinds of reasoning consist in nothing but a comparison"; Treatise, p. 199, Open Court edition). This concept is at the base of correlation and regression as well as of hypothesis testing. The concept of comparison is used in non-experimental non- random situations, too - for example, in the study of price in different state structures of liquor retailing (see Chapter 00) - with the device of likening the situation to random selection. The researcher attempts to make the situation resemble random selection by controlling extraneous variables, and by testing the data for being random-like. But some strict critics object to the use of statistical inference in any situations that are not actual experiments. Chapter 00 delves further into the crucial role of random sampling. The statistical study of the process of estimation and its accuracy - including confidence intervals - is a sort of measurement of equality - in particular, the likely equality between sample and population. Though the main (and early) use of statistical inference was with respect to supposedly-randomly-drawn samples, inference has come to be used for questions about continuity; this includes all time-series investigations even when the samples clearly were not drawn randomly. Again, the tactic used is to liken the data to random selection on a crucial dimension, or at least to test for it. Again some strict critics object. CONCLUSIONS In all knowledge-seeking and decision-making, our aim is to peer into the unknown and reduce our uncertainty a bit. The two main concepts that we use - the two great concepts in all of scientific knowledge-seeking, and perhaps in all practical thinking and decision-making - are a) continuity (or non- randomness) and the extent to which it applies in given situation, and b) random sampling, and the extent to which we can assume that our observations are indeed chosen by a random process. The assumption of constancy-sameness-continuity-persistence- whatever-you-call-it is the most basic and most important heuris- tic in obtaining knowledge about the world. There is no logical justification for it. But no one argues against its use, because it is the foundation of our lives; it works, and that's more than enough reason to accept it without further discussion. Statistical inference is not needed when the evidence is overwhelming. A thousand cholera cases at one well and zero at another obviously does not require a statistical test. Neither does 999 to one, or even 700 to 300, because our inbred and learned statistical senses can detect that the two situations are different. But probabilistic inference comes to be needed when the number of cases is relatively small or where for other reasons the data are somewhat ambiguous. **FOOTNOTES** [1]: It is because hypothesis testing focuses on this most basic of inferential processes - deciding "same" or "different" - that I believe it to be a more basic technique than estimating confidence intervals, which focus on the accuracy of estimates. Table 1 Florence Nightingale's Statistics of Mortality at Different Periods during the Crimean Wara Month Year Deaths per 1000 (Living) per Annum January (1855 1173 1/2 (1856 21 1/2 May (1855 203 (1856 8 January-May (1855 628 (1856 11 1/2 Crimea, May 1856 8 Line at home 18.7 Guard at home 20.4 aRef. 3, p. 295. Table 2 Florence Nightingale's Relative Mortality Statistics of the Army at Home and of the English Male Population at Corresponding Agesa Ages Deaths Annually to 1000 living Englishmen 8.4 20-25 English soldiers 17.0 Englishmen 9.2 25-30 English soldiers 18.3 Englishmen 10.2 30-35 English soldiers 18.4 Englishmen 11.6 35-40 English soldiers 19.3 a From Encyl of Stats, vol 3 "Nightingale, Florence", p. 253. Table I-1-1 Deaths of Mothers in Childbirth, Semmelweiss Hospital Data First Clinic Second Clinic Births Deaths Rate Births Deaths Rate - ---------------------------------------------------------------------- 1841 3,036 237 7.7 2,442 86 3.5 1842 3,287 518 15.8 2,659 202 7.5 1843 3,060 274 8.9 2,739 164 5.9 1844 3,157 260 8.2 2,956 68 2.3 1845 3,492 241 6.8 3,241 66 2.03 1846 4,010 459 11.4 3,754 105 2.7 Total 20,042 1,989 17,791 691 Avg. 9.92 3.38 Source: Semmelweis, Ignaz, The Etiology, Concept, and Prophylaxis of Childbed Fever, Translated and edited by K. Codell Carter (Madison, Wisconsin: Univ. of Wisconsin Press, 1983), p. 64. Table I-3-1 John Snow's Data on Cholera Rates for Three Wells Southwark and Vauxhall Supply 71 deaths per 10,000 houses Lambeth supply 5 deaths per 10,000 houses Rest of London 9 deaths per 10,000 houses Source: Winslow, Charles-Edward Amory, The Conquest of Epidemic Disease (Madison, Wisconsin: Univ. of Wisconsin Press, 1980), p. 276. Table I-4-1 Takaki's Japanese naval records of deaths from beriberi ---------------------------------------------------------------- Total navy Deaths from Year Diet personnel beriberi _________________________________________________________________ 1880 Rice diet 4,956 1,725 1881 Rice diet 4,641 1,165 1882 Rice diet 4,769 1,929 1883 Rice diet 5,346 1,236 1884 Change to new diet 5,638 718 1885 New diet 6,918 41 1886 New diet 8,475 3 1887 New diet 9,106 0 1888 New diet 9,184 0 ----------------------------------------------------------------- Source: K. Takaki, in Kornberg, 1989, p. 9. ENDNOTES **ENDNOTES** <1>: Hence, the study of data-collection research methods deserves a place in curricula, in my view; as a text I happily recommend Simon and Burstein (1985), of course. Unfortunately, many social-science faculties displace such study from the curriculum in favor of the logical-mathematical charms of technical statistics. <2>: In an article on Russell, Israel Shenker (1993) summarizes this preoccupation as follows: Philosophers traditionally wonder about the nature of reality. How do we know it? How can we prove that we know it? Does the forest exist if there's no one around to see it? It doesn't, say the skeptical ideal- ists. It does, say the philosophical realists. Is the external world, as idealists insist, merely a collec- tion of sensations in one's head? These hairsplitting issues are still in doubt, though the conviction that objects exist, with or without witnesses, is on the rise. <3>: See Popper's concept of Worlds 1, 2, and 3. <4>: The contrast here is with the Newtonian and 18th century view of the world - still held by most scientists - as a God-designed system whose principles and equations it is the job of the scientist to ascertain. I surmise that the Einstein-Bohr view is uncomfortable for many people because of the implied lack of surety - which means it lacks a feeling of psychological security. (Are the words "surety" and "security" related?). Without "first principles" one has nothing with which to justify the rest of one's thinking. It is like playing a game without a written set of rules, or taking an examination for which there are not clear-cut answers. There is a related issue with values. Some people can find psychological comfort in believing that secular and religious laws were handed down by an overarching authority. (This applies better to statutes rather than to case law.) More about this in Chapter 00 about judgment. <5>: Concerning "scientific Method": There is no such thing as the scientific method. Getting knowledge about the world is a many-many-faceted process in which we use myriad intellectual devices. We gather fresh information or we manipulate old existing information; we make comparisons; we test with controlled or uncontrolled experiments our preliminary hypotheses and new techniques. We do not limit ourselves to a small set of well-defined devices as John Stuart Mill suggested we ought to in his famous discussion of experimental designs. Instead, we work according to Percy Bridgman's definition of the scientific method as the use of one's mind "no holds barred" James Bryant Conant put it well: "[I]t is worse than nonsense to speak of the scientific method...one must speak of the ways, because there is no single way" to go about scientific work (1965, p. 18). F. S. C. Northrup wrote identically: "There is no one scientific method" (1959, p. ix) In the decades of the 1970s and 1980s, philosophers and statisticians took off the straitjacket of the "hypothetico-deductive method" that forbade the variety of informal devices to discover new knowledge that working scientists inevitably use (e. g. the work of Edward Leamer). They also embraced a wide-ranging unconstrained framework of grounds for judging the value of evidence (see, for example, the discussions in Shafer, 1988, and Diaconis, 1985). The new spirit is broad-ranging, multi-source, and multi-criterion (for example, taking into account such non-objective aspects of decision-making as regret). And there is much (perhaps too much) recognition that a scientific study must not only be accurate but also must be effective (even persuasive) in its rhethoric (e. g. the work of Donald McCloskey). All this is consistent with the point of view expressed here that getting knowledge is a process of successive improvements, that there is no single set of first principles by which one may "justify" one's assumptions and hence such ideas as a "true" parameter are often a detriment, that the more methods that are used the better, and that it is a great advantage of resampling that it throws back the curtains of fancy manipulative techniques and illuminates the non-formulaic aspects of knowledge- getting. <6>: This principle is related to David Hume's concept of constant conjunction. As I read it, Hume is referring to the association between two separate variables, but Russell reads Hume as referring to all sameness. <7>: These are cases of Hume's "constant conjunction." See Chapter 00 on causality. <8>: This was the problem I studied for my bachelor's thesis in 1953. I found then that there are two distinct ways - gradual unconscious accretion, and conscious systematic analysis, and the method used depends on a variety of factors including whether the person (or animal) is "set" to do the analysis. That finding seems as sound now as then, in light of subsequent research. <9>: The need for "justification" is psychologically and logically related to what Morris Cohen called "the uneliminable religious and moral craving for absolute certainty" (1956, p. 113). <10>: Note to computer scientists and information engineers: equal-unequal being a fundamental concept in inference should seem more familiar when one remembers that on-off and yes-no and equal-unequal is the fundamental unit of information in digital computing and other communications. <11>: In answer to an imagined skeptical reader's query: No, I have not studied the mathematics of Godel's proof. But his conclusion is not disputed by any mathematician, and it is only his conclusion that his relevant here. One can use a laser blackboard pointer or medical tool without any understanding of the physics of lasers. <12>: This gaming aspect of human behavior can cause a difficulty in statistical inference that will be mostly ignored here. <13>:Herbert Simon put it this way: Herbert A. Simon, Models Of My Life, p. 58, (New York: Basic Books, 1991). The social sciences have simultaneously suffered and benefited from the fact that many of the phenomena of human behavior are open for all of us to see and hear as part of our daily experience. We do not need tele- scopes, microscopes, Geiger counters, or radio detec- tors to observe the overt aspects of human behavior... As a consequence, much knowledge about human society -- even knowledge that might be termed "scientific" -- has been derived from observation and experience (1991, p. 58 <14>: Hume was right, of course; there had been a large increase, as we now can be sure). This also meant that in earlier times it was not possible to check on the sweeping observations by religious prophets and other social commentators that (say) the standard of living had gone down, and whether crime and other immorality had decreased, as we find asserted as early as Assyrian times. <15>: I benefited from the discussion of this matter by Hald, 1990, p. 93ff. <16>: A peculiar perverseness associated with the new knowledge of statistical inference is that very strong findings which require little or no formal inference to demonstrate, and are so powerful that they can be shown with a simple graph or table, are very hard to publish in social science literature because they do not meet the tests of "rigor," and "elegance." Editors view them as detracting from the "technical level" of their journals. A good deal of the greatest discoveries of the past would nowadays fall in this category of being difficult or impossible publish.