CHAPTER I-1

        WHERE STATISTICAL INFERENCE FITS INTO THE GETTING OF KNOWLEDGE


             Let's define statistical inference as:  The process of

        selecting a probabilistic model to resemble the process you wish

        to investigate, investigating that model's behavior, and

        interpreting the results.

             Probabilistic statistical inference is a crucial part of the

        process of informing ourselves about the world around us.  It

        helps us understand our world, and make sound decisions about how

             Until the 18th century, humanity's extensive knowledge of

        nature and technology was not based on formal probabilistic

        statistical inference.  And animals survive without statistics.

        But now that we humans have already dealt with many of the big

        questions that are easy to answer without probabilistic

        statistics, and now that we live in a more ramified world than in

        earlier centuries, the methods of inferential statistics become

        ever more important.

             Furthermore, statistical inference will surely become ever

        more important in the future as we voyage into realms that are

        increasingly difficult to comprehend.  The development of an

        accurate chronometer to tell time on sea voyages became a crucial

        need when Europeans sought to travel to the New World.

        Similarly, probability and statistical inference become crucial

        as we voyage out into space and down into the depths of the ocean

        and the earth, as well as probing into the secrets of the

        microcosm and of the human mind and soul.

             Where probabilistic statistical inference is employed, the

        inferential procedures may well not be the crucial element.  For

        example, the wording of the questions asked in a public-opinion

        poll may be more critical than the inferential procedures used to

        discern the reliability of the poll results<1>.  Yet we dare not

        disregard the role of the statistical procedures.


                          SOME CONCEPTS AND ASSUMPTIONS

        Definition of "Knowledge"

             As used here the term "knowledge" excludes mystic and

        religious knowledge - that which is apprehended by meditation or

        other non-everyday sorts of thinking.  (Whether "knowledge" is

        the appropriate term for this and subsequent excluded categories

        need not be discussed here.) I also exclude emotional

        understanding and other purely personal knowledge - that which

        you know about your own feelings but which another person cannot

        know except by your telling her/him about it.

             The term "knowledge" as used here refers only to material

        content in the "objective" realm - that is, statements about the

        world that two or more persons can discuss, concerning topics

        about which they can compare observations and beliefs.  This

        includes such statements as whether or not it is raining today,

        whether HIV causes AIDS, and whether Mike Mussina's curve ball is

        more effective than Ben McDonald's.

             It is natural to consider defining knowledge for purposes

        here as that which pertains to decision-making.  But such a

        definition has the defect that emotional reactions which are not

        objective knowledge - such as potential regrets at various

        outcomes of a decision - affect decisions, and properly so, and

        the definition needed here excludes those reactions.

             Though some writers would confine the definition and

        subsequent discussion to (say) prediction or explanation, the

        concept of knowledge in this book includes prediction, control,

        understanding, causal explanation, scientific law, and perhaps

        other types of interpretations of associations (correlations),

        too.  This diverseness is crucial for a sound concept of

        knowledge both inside and outside of science, in my view.

             The concept of knowledge used here goes beyond scientific

        knowledge; it also includes knowledge used in business,

        government, family and personal life, and other human

        activities.

             There will be no discussion of how a person receives indi-

        vidual bits of information.  That is, we shall not be concerned

        with the issues of sensation and perception which have received

        much attention from philosophers such as David Hume (1738; 1758),

        Bertrand Russell (1948) <2>, and Karl Popper (1979<3>.  Rather,

        we begin discussion at the moment that the person already has in

        her/his possession information which is objective and public in

        the sense that it either is already known to others or could be

        communicated to others.


        An Overall Intellectual System?

             This discussion of inference is not founded on any set of

        philosophical "first principles" from which everything else

        purports to be deduced.  But the discussion is not incompatible

        with most such world views. If your thinking rests on some

        particular structure of first principles, it is likely that most

        or all of what is said here will co-exist easily with your

        conceptual system. That is because in every practical craft one

        generally begins with the immediate facts at hand and proceeds

        with little recourse to more general principles.  And this is

        good practice; one needs to make recourse to more general

        knowledge only for the more difficult and confusing problems that

        one faces only occasionally. (Inquiry into the subtleties of

        whether an observed relationship should be considered causal is a

        good example of such infrequent difficult cases).  And it is a

        happy circumstance that it is so, because it enables people with

        very different world views to cooperate successfully and to make

        use of the products of each other's work.

             Of course one is best equipped for any occupation if one is

        acquainted with more general knowledge than most daily situations

        demand; you then are prepared to handle the atypical puzzling

        situations.  But even the more general knowledge one may need

        usually is not a great many layers up the hierarchy of knowledge.

        And in my experience and reading, a decision about statistical

        practice almost never depends on anything that might be close to

        "first principles".

             This viewpoint is connected in a fundamental way to the

        Kant-Einstein-Bohr view of theories as being a product of the

        human imagination rather than being a property of nature, and of

        the Bohr-Godel-Heisenberg view of knowledge as being necessarily

        open and incomplete. <4>  Asserting that there are no "natural"

        first principles dispensed or dictated to us from on high, but

        rather that the assumptions for any given system must be chosen

        by us - as in this view - does not imply that the choice of basic

        principles for a given activity is a matter of indifference, or

        that it does not matter where within the structure of knowledge

        you begin.  Just the opposite:  A wise choice of assumptions for

        a given inquiry or activity is all-important.  But the choice

        must be a matter of judgment, and it can be implicit. The judg-

        ment must depend on your aims, the data, the existing knowledge,

        and your own nature.  Making wise judgments about assumed under-

        lying principles and other working assumptions is one of the

        highest skills a researcher and a decisionmaker can possess. (See

        Chapter 00 on the role of judgment.)

             In my view the practical procedures of science, together

        with necessary general discussion of the hard cases, constitute a

        body of operational philosophy that is sufficient in itself, and

        may be preferable to any philosophical discussion of science that

        begins with a set of first principles.

             To avoid confusion, the term "operational philosophy" as

        used in the above paragraph has no connection to operationalism

        as a way of thinking; it simply means that what I write is

        intended to help people get their work done, rather than be

        merely speculative.  And my frequent use of the crucial concept

        of the operational definition, though it is a part of

        operationalism, does not at all mean that I subscribe to the more

        general prescriptions of operationalism<5>.


              KNOWLEDGE WITHOUT PROBABILISTIC STATISTICAL INFERENCE

             Let us distinguish two kinds of knowledge with which

        inference at large (that is, not just probabilistic statistical

        inference) is mainly concerned:  a) one or more absolute

        measurements on one or more dimensions of a collection of one or

        more items - for example, your income, or the mean income of the

        people in your country; and b) comparative measurements and

        evaluations of two or more collections of items, and especially,

        deciding whether they are equal or unequal - for example, the

        mean income in Brazil compared to the mean income in Argentina.

        Types (a) and (b) both include asking whether there has been a

        change between one observation and another.

             What is the conceptual basis of our getting these sorts of

        knowledge about the world?  I believe that our rock bottom

        conceptual tool (used without even our thinking about using it)

        is the assumption that we may call sameness, or continuity, or

        constancy, or repetition, or equality, or persistence;

        "constancy" and "continuity" will be the terms used most

        frequently here, and I shall use them interchangeably.  In

        physics this concept is call structure (Wigner, 1979, p. 29ff),

        and it is the precondition for the existence of laws of nature.

        (Structure in the laws of nature themselves is called

        invariance).<6>

             Continuity is a non-statistical concept.  It is a best guess

        about the next point beyond the known observations, without any

        idea of the accuracy of the estimate.  It is like testing the

        ground ahead when walking in a marsh.  It is local rather than

        global.  We'll talk a bit later about why continuity seems to be

        present in much of the world that we encounter.

             The other great concept in statistical inference, and

        perhaps in all inference taken together, is representative

        (usually random) sampling, to be discussed in Chapter II-OO.

        Representative sampling - which depends upon the assumption of

        sameness (homogeneity) throughout the universe to be investigated

        - is quite different than continuity; representative sampling

        assumes that there is no greater chance of a connection between

        any two elements that might be drawn into the sample than between

        any other two elements; the order of drawing is immaterial.  In

        contrast, continuity assumes that there is a greater chance of

        connection between two contiguous elements than between either

        one of the elements and any of the many other elements that is

        not contiguous to either.  Indeed, the process of randomizing is

        a device for doing away with continuity and autocorrelation

        within some bounded closed system - the sample "frame".  It is an

        attempt to map (describe) the entire area ahead using the device

        of the systematic survey.  Random representative sampling enables

        us to make probabilistic inferences about a population based on

        the evidence of a sample.

             To return now to the concept of sameness:  Examples of the

        principle are that we assume: a) Our house will be in the same

        place tomorrow as today.  b) A hammer will break an egg every

        time you hit the latter with the former (or even the former with

        the latter).  c) If you observe that the first fifteen persons

        you see walking out of a door at the airport are male, the

        sixteenth probably will be male also. d) Paths in the village

        stay much the same through a person's life.  e) Religious ritual

        changes little through the decades.  f) Your best guess about

        tomorrow's temperature or stock price is that will be the same as

        today's.  This principle of constancy is related to David Hume's

        concept of constant conjunction.

             The principle of sameness does not reject Heraclitus'

        observation (about 500 BC) that one never steps into the same

        river twice.  True, the molecules flow by.  But the river stays

        the same on the map, and in the minds of animals, and in

        contracts for water rights.  And the homeowner on the river's

        banks had better treat the river's presence as continuing or

        court disaster.  It is this practical sameness with respect to

        some decision or another that matters, Heraclitus notwithstand-

        ing.

             When my children were young, I would point to a tree on our

        lawn and ask:  "Do you think that tree will be there tomorrow?"

        And when they would answer "Yes", I'd ask, "Why doesn't the tree

        fall?"  That's a tough question to answer.

             There are two reasonable bases for predicting that the tree

        will be standing tomorrow.  First and most compelling for most of

        us is that almost all trees continue standing from day to day,

        and this particular one has never fallen; hence, what has been in

        the past is likely to continue.  This assessment requires no

        scientific knowledge of trees, yet it is a very functional way to

        approach most questions concerning the trees - such as whether to

        hang a clothesline from it, or whether to worry that it will fall

        on the house tonight.  That is, we can predict the outcome in

        this case with very high likelihood of being correct even though

        we do not utilize anything that would be called either science or

        statistical inference. (But what do you reply when your child

        says: "Why should I wear a seat belt?  I've never been in an

        accident"?)

             A second possible basis for prediction that the tree will be

        standing is scientific analysis of the tree's roots - how the

        tree's weight is distributed, its sickness or health, and so on.

        Let's put aside this sort of scientific-engineering analysis for

        now.

             The first basis for predicting that the tree will be

        standing tomorrow - sameness - is the most important heuristic

        device in all of knowledge-getting.  It is often a weak

        heuristic; certainly the prediction about the tree would be

        better grounded (!) after a skilled forester examines the tree.

        But persistence alone might be a better heuristic in a particular

        case than an engineering-scientific analysis alone.

             This heuristic appears more obvious if the child - or the

        adult - were to respond to the question about the tree with

        another question:  Why should I expect it to fall?  In the

        absence of some reason to expect change, it is quite reasonable

        to expect no change.  And the child's new question does not duck

        the central question we have asked about the tree, any more than

        one ducks a probability estimate by estimating the complementary

        probability (that is, unity minus the probability sought);

        indeed, this is a very sound strategy in many situations.

             Constancy can refer to location, time, relationship to

        another variable, or yet another dimension.  Constancy may also

        be cyclical.  Some cyclical changes can be charted or mapped with

        relative certainty - for example the life-cycles of persons,

        plants, and animals; the diurnal cycle of dark and light; and the

        yearly cycle of seasons.  The courses of some diseases can also

        be charted.  Hence these kinds of knowledge have long been known

        well.

             Consider driving along a road.  One can predict that the

        price of the next gasoline station will be within a few cents of

        the gasoline station that you just passed.  But as you drive

        further and further, the dispersion increases as you cross state

        lines and taxes differ.  This illustrates continuity.

             Constancy can also be transformational.  Some transforma-

        tions have sufficiently little uncertainty that people have

        understood them for ages - for example, cooking, brewing, smelt-

        ing, some medical practices such as setting bones and delivering

        babies, and various crafts.

             The attention to constancy can focus on a single event, such

        as leaves of similar shape appearing on the same plant.  Or

        attention can focus on single sequences of "production", as in

        the process by which a seed produces a tree.  For example, let's

        say you see two puppies - one that looks like a low-slung

        dachshund, and the other a huge mastiff.  You also see two grown

        male dogs, also apparently dachshund and mastiff.  If asked about

        the parentage of the small ones, you are likely - using the

        principle of sameness - to point - quickly and with surety - to

        the big dogs of the same breed.  (Here it is important to notice

        that this answer implicitly assumes that the fathers of the

        puppies are among these dogs.  But the fathers might be somewhere

        else entirely; it is in these ways that the principle of sameness

        can lead you astray.)

             When applying the concept of sameness, the object of

        interest may be collections of data, as in Semmelweiss's data on

        the consistent differences in rates of maternal deaths from

        childbed fever in two clinics with different conditions (see

        Table 11-1), or the similarities in sex ratios from year to year

        in Graunt's data on London births (Table 11-2), or the stark

        effect in John Snow's data on the numbers of cholera cases

        associated with two London wells (Table 11-3), or the reduction

        in beriberi among Japanese sailors as a result of a change in

        diet (Table 11-4).  These data seem so overwhelmingly clear cut

        that our naive statistical sense makes the relationships seem

        deterministic, and the conclusions seems straightforward.<7>

        (But the same statistical sense frequently misleads us when

        considering sports and stock market data.)

                       Table 1 [Semmelweiss Table 1 p. 64]

                         Table 2 [see in Hald or Graunt]

                            Table 3 [Winslow p. 276]

                      Table 4 [Table 1-1 in Kornberg, 1989]

             Constancy and sameness can be seen in macro structures;

        consider, for example, the constant location of your house.

        Constancy can also be seen in micro aggregations - for example,

        the raindrops and rain that account for the predictably

        fluctuating height of the Nile, or the ratio of boys to girls

        born in London, cases in which we can average to see the

        "statistical" sameness.  The total sum of the raindrops produces

        the level of a reservoir or a river from year to year, and the

        sum of the behaviors of collections of persons causes the birth

        rates in the various years.  This micro view of constancy points

        toward the notion of the Central Limit Theorem that will be

        discussed later.

             How does a person discover that there is a pattern of

        sameness?  This is a psychological issue - the question of

        concept formation (which I take to be the functional equivalent

        of the philosopher's concept of induction).<8>

             Statistical inference is only needed when a person thinks

        that s/he might have found a pattern but the pattern is not

        completely obvious to all.  Probabilistic inference works to test

        - either to confirm or disconfirm - the belief in the pattern's

        existence. We will see such cases in the following chapter.

             Please notice that no justification has yet been given for

        predicting on the basis of observed constancy, or for acting in

        reliance on that prediction.  Nor have I referred to this process

        with the label "induction", a label which connotes to most

        philosophers a more general process than I am describing here.

        (Finding a logical justification for induction has been one of

        the great searches in the history of philosophy.  But it has been

        entirely unsuccessful. And as one of its greatest practitioners -

        Bertrand Russell - eventually concluded, the search must be

        unsuccessful<9>. Even trying to explain logically why I think

        that the search must fail is not worth the time, I think.  And

        the search for justification usually refers to induction, and

        that in turn usually refers to producing theories, according to

        Popper (1979, Chapter 1). This is quite different than the

        assumption of sameness.)

             Lack of logical justification does not imply that the proc-

        ess of predicting or other generalization on the basis of ob-

        served constancy is without foundation, or is somehow illegimate,

        or should not be done.  The most satisfactory argument for acting

        on sameness - if any such argument is needed, and I think that it

        is not - is the thorough-going success of the process.  We may

        call this a pragmatic argument, in the tradition of Charles

        Pierce and William James. That is, anyone who decided not to

        proceed on the basis of past experience, and actually tried to so

        act, would find life either bewildering or impossible or both.

        <10>

             One can wonder why proceeding on the assumption of

        continuity succeeds, of course. I suggest these two answers to

        the question:  1) If there were not considerable continuity, we

        would not be here day after day, so we live in a world of

        continuity by "anthropic principle"?  (Barrow and Tipler, 1988).

        2) Here is a cosmological argument by analogy:  Imagine a ball of

        homogeneous yet chemically and physically malleable material

        entering a big incinerator.  Various things happen.  Perhaps one

        side is flattened like a penny on a railroad track.  Other mate-

        rials are pressed into it.  The ball is heated, and then heat

        escapes faster from the surface than from the inside. It is

        jostled by other objects of different kinds.  And so forth.  The

        result? Something like our earth, created not by design or equa-

        tion but by "chance" and evolution. It starts homogeneous, and

        ends up non-homogeneous.

             Just as it applies to cases of perfect historical constancy

        and certainty (for example, that the sun will come up tomorrow)

        this second argument also applies to predicting on the basis of

        slightly incomplete continuity and minor uncertainty (for

        example, that you will recover from a cold after a week or so).

        Actually, there are many cloudy days when you cannot verify that

        the sun has come up except indirectly, but let us ignore that

        distracting matter as we will ignore many matters that seem to be

        loose ends in the discussion. This is in keeping with the

        attitude that logical completeness is not our goal because it

        cannot be attained except in discussions of logic itself - and

        not always there, either, as Godel taught us.[<11>

             People have always been forced to think about and act in

        situations that have not been constant - that is, situations where

        the amount of variability in the phenomenon makes it impossible to

        draw clear cut, sensible conclusions.  For example, the appearance

        of game animals in given places and at given times has always been

        uncertain to hunters, and therefore it has always been difficult to

        know which target to hunt in which place at what time.  And of

        course variability of the weather has always made it a very

        uncertain element. The behavior of one's enemies and friends also

        has always been uncertain, too, though uncertain in a manner

        different from the behavior of wild animals; there often is a

        gaming element in interactions with other humans<12>.  But in

        earlier times, data and techniques did not exist to enable us to

        bring statistical inference to bear.

             It should be noted that though human behavior may be diffi-

        cult to predict in some circumstances, much human behavior is

        extraordinarily predictable.  Movies start exactly when the

        newspaper says they will.  If the price of gasoline is $1.09 at

        one station, it will be within cents of that price at a station

        across the street. If you put a $5 bill on the busy sidewalk, it

        will be gone in minutes, as David Hume advised us.

             A man who at noon leaves his purse full of gold on the
             pavement at Charing-Cross, may as well expect that it
             will fly away like a feather, as that he will find it
             untouched an hour after (Essays, p. 99, Open Court ed)

        The hoary comment that social science is somehow "softer" than is

        "hard" physical and biological science because human behavior is

        less predictable is simply a prejudice, arising most probably

        from a) people not noticing the huge amount of highly predictable

        knowledge of human behavior that we all possess, and classifying

        it outside social science, and b) the particular difficulty of

        the challenging problems in social science arising from their

        variability rather than (as in the physical and biological

        sciences) from being unable to observe the phenomenons that we

        wish to learn about. [<13>

             In medicine and healing (and in all the biological and

        physical sciences), the challenging research problems involve

        phenomena one cannot see with the naked eye - bacteria, viruses,

        how fleas behave and the killer germs they carry; this

        invisibility is the source of the uncertainty in these sciences.

        These hard problems are unlike the medical problems of broken

        bones, bleeding, and giving birth, about which healers have had

        extensive knowledge for millenia.

             In social science, by contrast, the difficulty of making

        major discoveries stems from our having long ago learned the

        easy-to-establish knowledge with our millenia of extensive day-

        to-day social experience.  Thus we have skimmed off the easy

        research problems in the social sciences.


                           THE TREATMENT OF UNCERTAINTY

             The purpose of statistical inference is to help us peer

        through the veil of variability when it obscures the main thrust

        of the data, so as to improve the decisions we make.  Statistical

        inference (or in most cases, simply probabilistic estimation) can

        help a) a gambler deciding on the appropriate odds in a betting

        game when there seems to be little or no difference between two

        or more outcomes; b) an astronomer deciding upon one or another

        value as the central estimate for the location of a star when

        there is considerable variation in the observations s/he has made

        of the star; c) a basketball coach pondering whether to take from

        the game her best shooter who has heretofore done poorly tonight;

        d) an oil-drilling firm debating whether to follow up a test-well

        drilling with a full-bore drilling when the probability of

        success is not overwhelming but the payoff to a gusher could be

        large.

             Until the 18th or even 19th century the canon of science did

        not embody uncertainty, and it proceeded on the basis of constan-

        cy alone.  For example,  J.S. Mill's famous experimental

        procedures to establish causality do not introduce uncertainty.

        [cite] The key idea underlying his experimental canon was to hold

        everything else the same, with certainty.

             Before we proceed to consider situations where we must grap-

        ple with uncertainty, let us repeat:  In most walks of life one

        seldom needs statistical inference.  Most matters, even in

        science, are well understood without using statistical inference

        to grapple with the uncertainty because (as noted above) most

        matters are highly stable and predictable - the positions of your

        desk, rug, and house; the arrival of your newspaper in the

        morning, the availability of your favorite foods in the store,

        the fate of a $5 bill left on the sidewalk.

             Returning to the tree near the Simon house:  Let's change

        the facts.  Assume now that one major part of the tree is mostly

        dead, and we expect a big winter storm tonight.  What is the

        danger that the tree will fall on the house?  Should we spend

        $1500 to have the mostly-dead third of it cut down?  We know that

        last year a good many trees fell on houses in the neighborhood

        during such a storm.

             We can gather some data on the proportion of old trees this

        size that fell on houses - about 5 in 100, so far as we can tell.

        Now it is no longer an open-and-shut case about whether the tree

        will be standing tomorrow, and we are using statistical inference

        to help us with our thinking.  We proceed to find a set of trees

        that we consider like this one, and study the variation in the

        outcomes of such trees.  So far we have estimated that the

        average for this group of trees - the mean (proportion) that fell

        in the last big storm - is 5 percent.

             Averages are much more "stable" - that is, more similar to

        each other - than are individual cases.  The tree populations

        illustrate.

             Notice how we use the crucial concept of sameness:  We

        assume that our tree is like the others we observed, or at least

        that it is not systematically different from most of them and it

        is more-or-less average.

             How would our thinking be different if our data were that
        one tree in 10 had fallen instead of 5 in 100?  This is a
        question in statistical inference.

             How about if we investigate further and find that 4 of 40

        elms fell, but only one of 60 oaks, and ours is an oak tree.

        Should we consider that oaks and elms have different chances of

        falling?  Proceeding a bit further, we can think of the question

        as:  Should we or should we not consider oaks and elms as

        different?  This is the type of statistical inference called

        "hypothesis testing":  we apply statistical procedures to help us

        decide whether to treat the two classes of trees as the same or

        different[1].  If we should consider them the same, our worries

        about the tree falling are greater than if we consider them

        different with respect to the chance of damage.

             Notice that statistical inference was not necessary for

        accurate prediction when I asked the kids about the likelihood of

        a live tree falling on a day when there would be no storm.  So it

        is with most situations we encounter.  But when the assumption of

        constancy becomes shaky for one reason or another, as with the

        sick tree falling in a storm, we need a more refined form of

        thinking.  We collect data on a large number of instances,

        inquire into whether the instances in which we are interested

        (our tree and the chance of it falling) are representative - that

        is, whether it resembles what we would get if we drew a sample

        randomly - and we then investigate the behavior of this large

        class of instances to see what light it throws on the

        instances(s) in which we are interested.

             The procedure in this case - which we shall discuss in

        greater detail later on - is to ask:  If oaks and elms are not

        different, how likely is it that only one of 60 oaks would fall

        whereas 4 of 40 elms would fall?  Again, notice the assumption

        that our tree is "representative" of the other trees about which

        we have information - that it is not systematically different

        from most of them, but rather that it is more-or-less average.

        Our tree certainly was not chosen randomly from the set of trees

        we are considering.  But for purposes of our analysis, we proceed

        as if it had been chosen randomly - because we deem it

        "representative".

             This is the first of two roles that the concept of

        randomness plays in statistical thinking.  Here is an example of

        the second use of the concept of randomness:  We conduct an

        experiment - plant elm and oak trees at randomly-selected

        locations on a plot of land, and then try to blow them down with

        a wind-making machine.  (The random selection of planting spots

        is important because some locations on a plot of ground have

        different growing characteristics than do others.)  Some purists

        object that only this sort of experimental sampling is a valid

        subject of statistical inference; it can never be appropriate,

        they say, to simply assume on the basis of other knowledge that

        the tree is representative.  I regard that purist view as a

        helpful discipline on our thinking.  But accepting its conclusion

        - that one should not apply statistical inference except to

        randomly-drawn or randomly-constituted samples - would take from

        us a tool that has proven useful in a variety of activities.

             As discussed earlier in this chapter, the data in some

        (probably most) scientific situations are so overwhelming that

        one can proceed without probabilistic inference.  Historical

        examples include those shown above of Semmelweiss and puerperal

        fever, and John Snow and cholera.  But where there was lack of

        overwhelming evidence, the causation of many diseases long

        remained unclear for lack of statistical procedures.  This led to

        superstitious beliefs and counter-productive behavior, such as

        quarantines against plague often were.  Some effective practices

        also arose despite the lack of sound theory, however - the waxed

        costumes of doctors, and the burning of mattresses, despite the

        wrong theory about the causation of plague; see Cipolla, 1981)

             So far I have spoken only of predictability and not of other

        elements of statistical knowledge such as understanding and

        control.  This is simply because statistical correlation is the

        bed rock of most scientific understanding, and predictability.

        Later we will expand the discussion beyond predictability; it

        holds no sacred place here.


                   WHERE STATISTICAL INFERENCE BECOMES CRUCIAL

             There was little role for statistical inference to play up

        until about three centuries ago because there existed very few

        scientific data.  For example, as late as the 1700s there were so

        few data about population size that there was a major controversy

        between David Hume and Charles Montesquieu (Hume 1741-1777/1985,

        pp. 377-464) about whether the population of the ancient world

        had increased or decreased since then.  <14>

             When scientific data began to appear, the need emerged for

        statistical inference to improve the interpretation of the data.

        As we saw, statistical inference is not needed when the evidence

        is overwhelming.  A thousand cholera cases at one well and zero

        at another obviously does not require a statistical test.

        Neither would 999 cases to one, or even 700 cases to 300, because

        our inbred and learned statistical senses can detect that the two

        situations are different.  But probabilistic inference comes to

        be needed when the number of cases is relatively small or where

        for other reasons the data are somewhat ambiguous.

             For example, when working with the 17th century data on

        births and deaths, John Graunt - great statistician though he was

        - drew wrong conclusions about some matters because he lacked

        modern knowledge of statistical inference.  For example, he found

        that in the rural parish of Romsey "there were born 15 Females

        for 16 Males, whereas in London there were 13 for 14, which

        shews, that London is somewhat more apt to produce Males, then

        the country" (p. 71).  He suggests that the "curious" inquire

        into the causes of this phenomenon, apparently not recognizing -

        and at that time he had no way to test - that the difference

        might be due solely to chance.  He also notices (p. 94) that the

        variations in deaths among years in Romsey were greater than in

        London, and he attempted to explain this apparent fact (which is

        just a statistical artifact) rather than understanding that this

        is almost inevitable because Romsey is so much smaller than

        London.  Because we have available to us the modern understanding

        of variability, we can now reach sound conclusions on these

        matters <15>.

             Another example in Graunt's work:  He hypothesized that

        fertility was lower than otherwise during the more "sickly" years

        in 17th century England.  He tried to test this hypothesis on the

        London Bills of Mortality.  (At this point the reader might

        examine whether the data in Table 11-5 permit you to reach a

        clear cut judgment about this hypothesis.)  Graunt arrived at a

        wrong conclusion - that the data supported his hypothesis -

        because he did not have available a general device for handling

        all the data at once, but instead he had to resort to examining

        local maxima and minima (see Hald, 1990, pp. 93 ff.) and

        therefore he was unable to reach a sound general conclusion.  But

        with modern ideas of correlation and the test of a correlation's

        statistical significance - which enable one to treat a collection

        of data all at once with summarizing statistics - he could easily

        have arrived at a sound answer.  More generally, summary

        statistics - such as the simple mean - are devices for reducing a

        large mass of data (inevitably confusing unless they are

        absolutely clear cut) to something one can manage to understand.

        And probabilistic inference is a device for determining whether

        patterns should be considered as facts or artifacts.<16>

                                    Table 11-5

             Here is another example that illustrates the state of early

        quantitative research in medicine:

             Exploring the effect of a common medicinal substance,
             Boecker examined the effect of sasparilla on the
             nitrogenous and other constituents of the urine.  An
             individual receiving a controlled diet was given a
             decoction of sasparilla for a period of twelve days,
             and the volume of urine passed daily was carefully
             measured.  For a further twelve days that same
             individual, on the same diet, was given only distilled
             water, and the daily quantity of urine was again
             determined.  The first series of researches gave the
             following figures (in cubic centimeters):  1,467,
             1,744, 1,665, 1,220, 1,161, 1,369;, 1,675, 2,199, 887,
             1,634, 943, and 2,093 (mean = 1,499); the second ser-
             ies:  1,263, 1,740, 1,538, 1,526, 1,387, 1,422, 1,754,
             1,320, 1,809, 2,139, 1,574, and 1,114 (mean = 1,549).
             Much uncertainty surrounded the exactitude of these
             measurements, but this played little role in the ensu-
             ing discussion.  The fundamental issue was not the
             quality of the experimental data but how inferences
             were drawn from those data (Coleman in Kruger, 1987, p
             207).

        The experimenter Boecker had no reliable way of judging whether

        the data for the two groups were or were not meaningfully

        different, and therefore he arrived at the unsound conclusion

        that there was indeed a difference.  (Gustav Radicke used this

        example as the basis for early work on statistical

        significance.)

             Another example:  Joseph Lister convinced the scientific

        world of the germ theory of infection, and the possibility of

        preventing death with a disinfectant, with these data:  Prior to

        the use of antiseptics - 16 post-operative deaths in 35

        amputations; subsequent to the use of antiseptics - 6 deaths in

        40 amputations (Winslow, 1943, p. 303).  But how sure could one

        be that a difference of that size might not occur just by chance?

        No one then could say, nor did anyone inquire, apparently.

             Here's another example of great scientists falling into

        error because of a too-primitive approach to data (Feller, 3rd

        ed, 1968, pp. 69-70):  Charles Darwin wanted to compare two sets

        of measured data, each containing 13 observations.  At Darwin's

        request, Francis Galton compared the two sets of data by ranking

        each, and then comparing them pairwise.  "The a's were ahead 13

        times.  Without knowledge of the actual probabilities Galton

        concluded that the treatment was effective.  But, assuming

        perfect randomness, the probability that the a's beat [the

        others] 13 times or more equals 3/16.  This means that in three

        out of sixteen cases a perfectly ineffectual treatment would

        appear as good or better than the treatment classified as

        effective by Galton."  That is, Galton and Darwin reached an

        unsound conclusion.  As Feller says, "This shows that a

        quantitative analysis may be a valuable supplement to our rather

        shaky intuition" (p. 70).

             Looking ahead, the key tool in situations like Graunt's and

        Boecker's and Lister's is creating ceteris paribus - making

        "everything else the same" - with random selection in

        experiments, or at least with statistical controls in non-

        experimental situations.

             The insurance industry early on made large strides toward

        statistical inference.  Insurance itself enables us to deal with

        the uncertainties of shipwreck, fire, life and death [ed:  no

        comma after "life"].  But the central principle of equality in

        insurance analysis, could be assumed among given persons,

        voyages, and fire risk - an average length of life, for example -

        with needing to test for equality. So insurance could do without

        modern probabilistic inference.

             There is a long and colorful history of dealing with

        uncertainty in gambling; examples of dice-like devices date back

        to Egyptian times.  But probabilistic knowledge about gambling

        odds was more lore than scientific understanding.  And the

        problems were not mainly those of statistical inference.
             In astronomy, over the millenia scientists were able to make

        great progress without statistical inference because the outcomes

        of most studies are well-determined; there was little doubt that

        Saturn was Saturn to all observers, and the sun appears every

        clear day in most latitudes.  But when there are differences

        among astronomers in observations of the location of planets and

        stars, and outliers among the observations made by a single

        person, the treatment of the ensuing uncertainty requires modern

        statistical inference; this need was one of the major sources of

        the development of statistical inference.

             At first astronomers did not even use the concept of the

        mean (Hald, 1990).  Then need led to their inventing that

        concept.  Later a new question arose about whether variation is

        random.  All this was major progress toward the statistical

        understanding we now possess.

             In medical research, there was pressing need to know how to

        decide whether data collections should be considered the same or

        different when the results of comparisons are not as overwhelming

        as in Semmelweiss's study of puerperal fever or Takaki's's study

        of beriberi (Table 4). This can be considered a question about

        continuity -  whether two collections of data on health outcomes

        of people who have had different environments or therapies should

        be considered as continuous with each other, either in the sense

        of being the same or in the sense of there not being a sharp

        break between them. When there is considerable overlap and non-

        dominance in results among groups, one needs statistical infer-

        ence to reach valid conclusions. One compare samples with the

        help of the concept of random sampling. This concept enables us

        to compare collections that are not manifestly equal, as in

        simple physical experiments illustrated with Boecker's sasparilla

        data and Lister's disinfectant data above.

             The concept of comparison - that is, whether things should

        be considered as equal or not (even before quantitative compari-

        son) - is fundamental in the above paragraph (as David Hume told

        us it is fundamental in all knowledge; "All kinds of reasoning

        consist in nothing but a comparison"; Treatise, p. 199, Open

        Court edition).  This concept is at the base of correlation and

        regression as well as of hypothesis testing.

             The concept of comparison is used in non-experimental non-

        random situations, too - for example, in the study of price in

        different state structures of liquor retailing (see Chapter 00) -

        with the device of likening the situation to random selection.

        The researcher attempts to make the situation resemble random

        selection by controlling extraneous variables, and by testing the

        data for being random-like.  But some strict critics object to

        the use of statistical inference in any situations that are not

        actual experiments.  Chapter 00 delves further into the crucial

        role of random sampling.

             The statistical study of the process of estimation and its

        accuracy - including confidence intervals - is a sort of

        measurement of equality - in particular, the likely equality

        between sample and population.

             Though the main (and early) use of statistical inference was

        with respect to supposedly-randomly-drawn samples, inference has

        come to be used for questions about continuity; this includes all

        time-series investigations even when the samples clearly were not

        drawn randomly.  Again, the tactic used is to liken the data to

        random selection on a crucial dimension, or at least to test for

        it.  Again some strict critics object.


                                   CONCLUSIONS

             In all knowledge-seeking and decision-making, our aim is to

        peer into the unknown and reduce our uncertainty a bit.  The two

        main concepts that we use - the two great concepts in all of

        scientific knowledge-seeking, and perhaps in all practical

        thinking and decision-making - are a) continuity (or non-

        randomness) and the extent to which it applies in given

        situation, and b) random sampling, and the extent to which we can

        assume that our observations are indeed chosen by a random

        process.

             The assumption of constancy-sameness-continuity-persistence-

        whatever-you-call-it is the most basic and most important heuris-

        tic in obtaining knowledge about the world.  There is no logical

        justification for it.  But no one argues against its use, because

        it is the foundation of our lives; it works, and that's more than

        enough reason to accept it without further discussion.

             Statistical inference is not needed when the evidence is

        overwhelming.  A thousand cholera cases at one well and zero at

        another obviously does not require a statistical test.  Neither

        does 999 to one, or even 700 to 300, because our inbred and

        learned statistical senses can detect that the two situations are

        different.  But probabilistic inference comes to be needed when

        the number of cases is relatively small or where for other

        reasons the data are somewhat ambiguous.


        **FOOTNOTES**


             [1]: It is because hypothesis testing focuses on this most
        basic of inferential processes - deciding "same" or "different" -
        that I believe it to be a more basic technique than estimating
        confidence intervals, which focus on the accuracy of estimates.


                       Table 1 Florence Nightingale's Statistics of

        Mortality

                       at Different Periods during the Crimean Wara

                            Month     Year Deaths per 1000 (Living) per
        Annum


                       January   (1855          1173 1/2
                                      (1856              21 1/2

                       May            (1855             203
                                      (1856                 8

                       January-May    (1855             628
                                      (1856               11 1/2

                       Crimea, May 1856                 8
                       Line at home                        18.7
                       Guard at home                   20.4

                        aRef. 3, p. 295.


                  Table 2 Florence Nightingale's Relative Mortality
                  Statistics of the Army at Home and of the English
                  Male Population at Corresponding Agesa

                   Ages               Deaths Annually to 1000 living

                        Englishmen              8.4
                  20-25
                        English soldiers   17.0
                        Englishmen                9.2
                  25-30
                        English soldiers   18.3
                        Englishmen              10.2
                  30-35
                        English soldiers   18.4
                        Englishmen              11.6
                  35-40
                        English soldiers   19.3

                  a

        From Encyl of Stats, vol 3 "Nightingale, Florence", p. 253.

                                   Table I-1-1

               Deaths of Mothers in Childbirth, Semmelweiss Hospital Data

                         First Clinic                      Second Clinic

                 Births     Deaths     Rate         Births     Deaths     Rate -

        ----------------------------------------------------------------------

        1841      3,036       237       7.7          2,442        86       3.5
        1842      3,287       518      15.8          2,659       202       7.5
        1843      3,060       274       8.9          2,739       164       5.9
        1844      3,157       260       8.2          2,956        68       2.3
        1845      3,492       241       6.8          3,241        66       2.03
        1846      4,010       459      11.4          3,754       105       2.7

        Total    20,042     1,989                   17,791       691

        Avg.                           9.92                                3.38




             Source:  Semmelweis, Ignaz, The Etiology, Concept, and

        Prophylaxis of Childbed Fever, Translated and edited by K. Codell

        Carter (Madison, Wisconsin: Univ. of Wisconsin Press, 1983), p.

        64.

                                   Table I-3-1

                John Snow's Data on Cholera Rates for Three Wells

        Southwark and Vauxhall Supply       71 deaths per 10,000 houses
        Lambeth supply                       5 deaths per 10,000 houses
        Rest of London                       9 deaths per 10,000 houses

             Source:  Winslow, Charles-Edward Amory, The Conquest of

        Epidemic Disease (Madison, Wisconsin: Univ. of Wisconsin Press,

        1980), p. 276.

                                   Table I-4-1

             Takaki's Japanese naval records of deaths from beriberi

        ----------------------------------------------------------------

                                     Total navy     Deaths from

         Year          Diet           personnel        beriberi

        _________________________________________________________________

         1880       Rice diet           4,956           1,725

         1881       Rice diet           4,641           1,165

         1882       Rice diet           4,769           1,929

         1883       Rice diet           5,346           1,236

         1884       Change to new diet  5,638             718

         1885       New diet            6,918              41

         1886       New diet            8,475               3

         1887       New diet            9,106               0

         1888       New diet            9,184               0

        -----------------------------------------------------------------

        Source:  K. Takaki, in Kornberg, 1989, p. 9.





                                     ENDNOTES



        **ENDNOTES**

             <1>: Hence, the study of data-collection research methods

        deserves a place in curricula, in my view; as a text I happily

        recommend Simon and Burstein (1985), of course.  Unfortunately,

        many social-science faculties displace such study from the

        curriculum in favor of the logical-mathematical charms of

        technical statistics.

             <2>: In an article on Russell, Israel Shenker (1993)

        summarizes this preoccupation as follows:

             Philosophers traditionally wonder about the nature of
             reality.  How do we know it?  How can we prove that we
             know it?  Does the forest exist if there's no one
             around to see it?  It doesn't, say the skeptical ideal-
             ists.  It does, say the philosophical realists.  Is the
             external world, as idealists insist, merely a collec-
             tion of sensations in one's head?  These hairsplitting
             issues are still in doubt, though the conviction that
             objects exist, with or without witnesses, is on the
             rise.


             <3>: See Popper's concept of Worlds 1, 2, and 3.

             <4>: The contrast here is with the Newtonian and 18th

             century view of the world - still held by most

             scientists - as a God-designed system whose principles

             and equations it is the job of the scientist to

             ascertain.  I surmise that the Einstein-Bohr view is

             uncomfortable for many people because of the implied

             lack of surety - which means it lacks a feeling of

             psychological security.  (Are the words "surety" and

             "security" related?).  Without "first principles" one

             has nothing with which to justify the rest of one's

             thinking.  It is like playing a game without a written

             set of rules, or taking an examination for which there

             are not clear-cut answers.

             There is a related issue with values.  Some people can

             find psychological comfort in believing that secular

             and religious laws were handed down by an overarching

             authority.  (This applies better to statutes rather

             than to case law.) More about this in Chapter 00 about

             judgment.

             <5>:  Concerning "scientific Method":  There is no such

             thing as the scientific method. Getting knowledge about

             the world is a many-many-faceted process in which we

             use myriad intellectual devices.  We gather fresh

             information or we manipulate old existing information;

             we make comparisons; we test with controlled or

             uncontrolled experiments our preliminary hypotheses and

             new techniques.  We do not limit ourselves to a small

             set of well-defined devices as John Stuart Mill

             suggested we ought to in his famous discussion of

             experimental designs.  Instead, we work according to

             Percy Bridgman's definition of the scientific method as

             the use of one's mind "no holds barred"

             James Bryant Conant put it well: "[I]t is worse than

             nonsense to speak of the scientific method...one must

             speak of the ways, because there is no single way" to

             go about scientific work (1965, p. 18). F. S. C.

             Northrup wrote identically:  "There is no one

             scientific method" (1959, p. ix)

             In the decades of the 1970s and 1980s, philosophers and

             statisticians took off the straitjacket of the

             "hypothetico-deductive method" that forbade the variety

             of informal devices to discover new knowledge that

             working scientists inevitably use (e. g. the work of

             Edward Leamer).  They also embraced a wide-ranging

             unconstrained framework of grounds for judging the

             value of evidence (see, for example, the discussions in

             Shafer, 1988, and Diaconis, 1985).  The new spirit is

             broad-ranging,  multi-source, and multi-criterion (for

             example, taking into account such non-objective aspects

             of decision-making as regret).  And there is much

             (perhaps too much) recognition that a scientific study

             must not only be accurate but also must be effective

             (even persuasive) in its rhethoric (e. g. the work of

             Donald McCloskey).  All this is consistent with the

             point of view expressed here that getting knowledge is

             a process of successive improvements, that there is no

             single set of first principles by which one may

             "justify" one's assumptions and hence such ideas as a

             "true" parameter are often a detriment, that the more

             methods that are used the better, and that it is a

             great advantage of resampling that it throws back the

             curtains of fancy manipulative techniques and

             illuminates the non-formulaic aspects of knowledge-

             getting.

             <6>: This principle is related to David Hume's concept

             of constant conjunction.  As I read it, Hume is

             referring to the association between two separate

             variables, but Russell reads Hume as referring to all

             sameness.


             <7>: These are cases of Hume's "constant conjunction."

             See Chapter 00 on causality.


             <8>:   This was the problem I studied for my bachelor's

             thesis in 1953.  I found then that there are two

             distinct ways - gradual unconscious accretion, and

             conscious systematic analysis, and the method used

             depends on a variety of factors including whether the

             person (or animal) is "set" to do the analysis.  That

             finding seems as sound now as then, in light of

             subsequent research.


             <9>: The need for "justification" is psychologically and

             logically related to what Morris Cohen called "the

             uneliminable religious and moral craving for absolute

             certainty" (1956, p. 113).


             <10>:  Note to computer scientists and information

             engineers: equal-unequal being a fundamental concept in

             inference should seem more familiar when one remembers

             that on-off and yes-no and equal-unequal is the

             fundamental unit of information in digital computing and

             other communications.

             <11>: In answer to an imagined skeptical reader's query:

             No, I have not studied the mathematics of Godel's proof.

             But his conclusion is not disputed by any mathematician,

             and it is only his conclusion that his relevant here.  One

             can use a laser blackboard pointer or medical tool without

             any understanding of the physics of lasers.

             <12>: This gaming aspect of human behavior can cause a

             difficulty in statistical inference that will be mostly

             ignored here.

             <13>:Herbert Simon put it this way: Herbert A. Simon,

             Models Of My Life, p. 58, (New York: Basic Books, 1991).

             The social sciences have simultaneously suffered and
             benefited from the fact that many of the phenomena of
             human behavior are open for all of us to see and hear
             as part of our daily experience.  We do not need tele-
             scopes, microscopes, Geiger counters, or radio detec-
             tors to observe the overt aspects of human behavior...
             As a consequence, much knowledge about human society --
             even knowledge that might be termed "scientific" -- has
             been derived from observation and experience (1991, p.
             58

             <14>: Hume was right, of course; there had been a large

             increase, as we now can be sure).  This also meant that in

             earlier times it was not possible to check on the sweeping

             observations by religious prophets and other social

             commentators that (say) the standard of living had gone

             down, and whether crime and other immorality had decreased,

             as we find asserted as early as Assyrian times.

             <15>: I benefited from the discussion of this matter by

             Hald, 1990, p. 93ff.

             <16>: A peculiar perverseness associated with the new

             knowledge of statistical inference is that very strong

             findings which require little or no formal inference to

             demonstrate, and are so powerful that they can be shown

             with a simple graph or table, are very hard to publish in

             social science literature because they do not meet the

             tests of "rigor," and "elegance." Editors view them as

             detracting from the "technical level" of their journals.

             A good deal of the greatest discoveries of the past would

             nowadays fall in this category of being difficult or

             impossible publish.
