CHAPTER II-2

                                TESTING HYPOTHESES

             The concept of sameness was said in Chapter 00 to be the

        basic idea in all of knowledge-getting, and the use of the

        assumption of sameness is the most valuable heuristic we have for

        the process of inference.  Following on this, it obviously is of

        great importance that one be able to say with some confidence

        whether two collections of things are or are not the same for

        one's current purposes.

             In most situations in which we find ourselves, it is quite

        clear without refined inquiry whether or not it is sensible to

        treat the collections or items as the same.  When you see bags of

        potato chips on the shelf of a supermarket, you don't bother to

        examine which bag has the more or better-shaped chips, though

        some discriminating purchasers examine individual pieces of fresh

        fruit at the greengrocer; any possible difference between the

        potato-chip bags is too small for you to be concerned with, you

        are sure.  But when you see a group of Japanese tourists born in

        the 1930s or earlier, you are sure that their heights are far

        less on average than the next group of Japanese tourists who were

        born in the 1960s.  And you do not need to collect data or

        consult previous studies to know that ingested alcohol has a

        powerful effect on the human constitution and behavior.

             From time to time, however, situations arise wherein we are

        in doubt about whether two collections or items should be

        considered the same or different (whether the heights of two

        groups of tourists, say) or whether one "outlier" should be

        considered to be from the same collection as other observations

        (of a planet, say, one of the earliest problems in statistical

        inference).  To help us make a determination, we call upon one of

        the main procedures in statistical inference - the testing of

        hypotheses.

             The logic of hypothesis testing is the subject of this

        chapter.  This logic is relatively uncomplicated and

        uncontroversial compared to the logic of confidence intervals,

        discussed in Chapters 00.  Yet it, too, cannot be done as a

        matter of routine, and requires judgment.

             The first published formal test of a hypothesis was by John

        Arbuthnot, doctor to Queen Anne of England, who thereby becomes

        the father of formal statistical inference in 1710.  He observed

        that more boys than girls are born, which he assumed is necessary

        for the survival of the species, and he wished to prove that

        birth sex is indeed not a 50-50 probability.  The records for

        London showed that male births exceeded female 82 years in a row.

        Arbuthnot therefore set forth to (in modern language) test the

        hypothesis that a universe with a 50-50 probability of producing

        males could result in 82 successive years with preponderantly

        male births.

             This is a canonical problem.  You have some observed "sam-

        ple" data, and you want to connect them to some specified "popu-

        lation" from which they may have come.  The previous sentence was

        purposely worded vaguely because statistical questions can be

        stated in many different ways. But in this case statisticians

        agree on how to proceed:  Specify the universe, and compare its

        behavior against the observed sample.  If it is unlikely that a

        sample as surprising as the observed sample should come from the

        specified universe, conclude that the sample did not come from

        that universe.  Chapter III-1 describes how Arbuthnot went about

        it.

             The practical business of carrying out a statistical

        inference begins with the translation of a general question into

        a scientific question, and thence into a question amenable to

        statistical treatment; this process of translation, common to all

        scientific inference, is discussed in Chapter 00.   The general

        procedure for the probabilistic manipulation carried out in the

        context of a statistical inference, which pertains to all

        statistical inference and not just to testing hypotheses, is set

        forth below.  (The subject was introduced in Chapter I-1.)  The

        overall procedure for a statistical inference, from the

        translation of the question to the conclusion, can be framed in a

        long series of questions and answers about the nature of the

        universe(s) and sample(s), the probabilistic manipulation, and

        then interpretation.  The canonical series of these questions and

        answers for testing hypotheses is presented in this chapter, and

        the series of questions for finding confidence intervals is in

        Chapter 00.


                        THE STEPS IN STATISTICAL INFERENCE

             These are the steps in conducting statistical inference:

             1.  Frame a question in the form of: What is the chance of

        getting the observed sample s from some specified population S?

             The postulated universe S bears some likeness to the model

        created by the researcher against which to test the observed

        data.  But instead of deriving from theory, insight, hunch,

        whatever, in inference the model derives from the sample (plus

        perhaps a Bayesian prior).

             Another difference from the "scientific" model is that the

        postulated universe S has no causal connection to sample s except

        the process of (random?) sampling.

             Universe S is like a scientific model in that it is assumed

        not to be a perfect picture of nature.  But unlike a scientific

        model, in the case of a finite universe we assume that larger and

        larger samples can approach the actual universe.

             2.  Reframe the question in the form of: What kinds of

        samples does population S produce, with which probabilities?

        That is, what is the probability of the observed sample s given

        that a population is S?  Or, what is p(s!S)?

             3.  Actually investigate the behavior of S with respect to s

        and other samples.  This can be done in two ways:

             a.  Use the calculus of probability ("math"), perhaps

        resorting to the Monte Carlo method if an appropriate formula

        does not exist.  Or

             b.  Resampling (in the larger sense), which equals the Monte

        Carlo method minus its use for approximations, investigation of

        complex functions in statistics and other theoretical

        mathematics, and non-resampling uses elsewhere in science.

        Resampling in the more restricted sense includes bootstrap,

        permutation, and other non-parametric methods.  More about the

        resampling procedure follows the paragraphs to come, and then in

        later chapters in the book.             4.  Interpret the probabilities that result from step 3 in

        terms of acceptance or rejection of hypotheses, surety of

        conclusions, and as inputs to decision theory.

             The following short definition of statistical inference

        summarizes the previous four steps:  statistic inference equals

        the selection of a probabilistic model to resemble the process

        you wish to investigate, the investigation of that model's

        behavior, and the interpretation of the results.


          STEPS IN ESTIMATION OF STATISTICAL PROBABILITIES BY RESAMPLING

             Stating the steps to be followed in a procedure is an

        operational definition of the procedure.  My belief in the

        clarifying power of this device is embodied in the several sets

        of steps given in this chapter for the various aspects of

        statistical inference.  This section sets forth the steps if the

        computation of the probabilities if the inference will be done

        with resampling.  More detail may be found in the rest of this

        chapter, and in Chapter 00.

             Let us define resampling in a fashion that will include not

        only problems in inferential statistics but also problems in

        probability, as follows: Using the entire set of data you have in

        hand, or using the given data-generating mechanism (such as a

        die) that is a model of the process you wish to understand,

        produce new samples of simulated data, and examine the results of

        those samples.  In some cases, it may also be appropriate to

        amplify this procedure with additional assumptions.

             Problems in pure probability may at first seem different in

        nature than problems in statistical inference.  But the same

        logic as stated in this definition applies to both varieties of

        problems.  The difference is that in probability problems the

        "model" is known in advance -- say, the model implicit in a deck

        of poker cards plus a game's rules for dealing and counting the

        results -- rather than the model being assumed to be best

        estimated by the observed data, as in resampling statistics.

             The following general procedure simulates what we are doing

        when we estimate a probability using resampling operations:

             Step A.  Construct a simulated "universe" of cards or dice

        or some other randomizing mechanism whose composition is similar

        to the universe whose behavior we wish to describe and

        investigate.  The term "universe" refers to the system that is

        relevant for a single simple event.  For example:

             A coin with two sides, or two sets of random numbers "1-105"

        and "106-205", simulates the system that produces a single male

        or female birth, when we are estimating the probability of three

        girls in the first four children or nine female calves in ten

        births (the problem to be treated below.)  Notice that in this

        universe the probability of a female remains the same from trial

        event to trial event -- that is, the trials are independent --

        demonstrating a universe from which we sample with replacement.

             Step(s) B.  Specify the procedure that produces a pseudo-

        sample which simulates the real-life sample in which we are

        interested.  That is, specify the procedural rules by which the

        sample is drawn from the simulated universe.  These rules must

        correspond to the behavior of the real universe in which you are

        interested.  To put it another way, the simulation procedure must

        produce simple experimental events with the same probabilities as

        the simple events have in the real world.  For example:

             In the case of three daughters in four children, or nine

        female calves in ten births, you can draw a card and then replace

        it if you are using a deck of red and black cards.  Or if you are

        using a random-numbers table, the random numbers automatically

        simulate replacement.  Just as the chances of having a female or

        a male do not change depending on the sex of the preceding birth,

        so we want to ensure through replacement that the chances do not

        change each time we choose from the deck of cards.

             Recording the outcome of the sampling must be indicated as

        part of this step, e.g. "record `yes' if female, `no' if male."

             Step(s) C.  If several simple events must be combined into a

        composite event, and if the composite event was not described in

        the procedure in step B, describe it now.  For example:

             For the number of females in a sample of births, the

        procedure for each simple event of a single birth was described

        in step B.  Now we must specify repeating the simple event four

        times, and counting whether the outcome is or is not three girls

        in the four childbirths or nine females in ten calves.

             Recording of "three or more girls" or "two or less girls",

        and "9 or more females" or "8 or fewer", is part of this step.

        This record indicates the results of all the trials and is the

        basis for a tabulation of the final result.

             Step(s) D.  Calculate the probability of interest from the

        tabulation of outcomes of the resampling trials.  For example:

        the proportions of a) "yes" and "no", and b) "9 or more" and "8

        or fewer", estimate the likelihood we wish to estimate in step C.             An Example: [From Hodges and Lehman, 1970]:  Female calves

        are more valuable than male calves.  A bio-engineer claims to

        have a method that can cause more females.  He tests the

        procedure on ten of your pregnant cows, and the result is nine

        females.  Should you believe that his method has some effect?

        That is, what is the probability of a result this or more

        surprising occurring by chance?

              The actual computation of probability may be done with

        several formulaic or sample-space methods, and in several

        resampling methods.  I will first show a resampling method and

        then several conventional methods.  The following material that

        allows one to compare resampling and conventional methods is more

        germane to the explication of resampling in Chapters 00 and 00

        than it is to the theory of hypothesis test discussed in this

        chapter, but it is more expedient to present it here.

                   Computation of Probabilities with Resampling

             We can do the problem by hand as follows:

             1.  Constitute an urn with either one blue and one pink

        balls, or 106 blue and 100 pink balls.

             2.  Draw ten balls with replacement, count pinks, and

        record.

             3.  Repeat step (2) say 400 times.

             4.  Calculate proportion of results with 9 or 10 pinks.

             Or, we can take advantage of the speed and efficiency of the

        computer as follows (also in ycha071):

          REPEAT 15000

            GENERATE 10 1,2 A
            COUNT A =1 B
            SCORE B Z
          END
          HISTOGRAM Z
          COUNT Z >=9 K
          DIVIDE K 15000 KK
          PRINT KK


          4000+
              +                          *
              +                          *
        F     +                          *
        r     +                          *
        e 3000+                     *    *    *
        q     +                     *    *    *
        u     +                     *    *    *
        e     +                     *    *    *
        n     +                     *    *    *
        c 2000+                     *    *    *
        y     +                *    *    *    *    *
              +                *    *    *    *    *
        *     +                *    *    *    *    *
              +                *    *    *    *    *
        Z 1000+                *    *    *    *    *
              +                *    *    *    *    *
              +           *    *    *    *    *    *    *
              +           *    *    *    *    *    *    *
              +      *    *    *    *    *    *    *    *    *
             0+-----------------------------------------------------
                |^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|
                0         2         4         6         8        10

        Vector no. 1: Z

                Bin                           Cum
             Center      Freq       Pct       Pct
        --------------------------------------------
                  0        22       0.1       0.1
                  1       163       1.1       1.2
                  2       650       4.3       5.6
                  3      1801      12.0      17.6


        Resampling Stats  -  D:\...\CALVES.STA      -  Wed  7/ 7/93 13:40:59  -  Page 2

                  4      3075      20.5      38.1
                  5      3717      24.8      62.9
                  6      3035      20.2      83.1
                  7      1739      11.6      94.7
                  8       636       4.2      98.9
                  9       145       1.0      99.9
                 10        17       0.1     100.0


        Note: Each bin covers all values within 0.1 of its center.

        KK       =     0.0108



                               CONVENTIONAL METHODSSample Space and First Principles

             Assume for a moment that our problem is a smaller one and

        therefore much easier - the probability of getting two females in

        two calves if the probability of a female is .5.  One could then

        map out the sample space, and find the proportion of points that

        correspond to a "success". We list all four possible combinations

        - FF, FM, MF, MM.  Now we look at the ratio of the number of

        combinations that have 2 females to total, which is 1/4.  We may

        then interpret this probability.

             We might also use this method for (say) five female calves

        in a row.  We can make a such as FFFFF, MFFFF, MMFFF,

        MMMFFF...MFMFM...MMMMM. There will be 2*2*2*2*2 = 32

        possibilities, and 64 and 128 possibilities for six and seven

        calves respectively.  But when we would get as high as ten

        calves, this method would become very troublesome.

        Sample Space Calculations

             For two females in a row, we could use the well known, and
        very simple, multiplication rule; we could do so even for ten
        females in a row.  But calculating the probability of nine
        females in ten is a bit more complex.

        Pascal's Triangle

             One can use Pascal's Triangle to obtain binomial

        coefficients for p = .5 and a sample size of 10, focusing on

        those for 9 or 10 successes.  Then calculate the proportion of

        the total cases with 9 or 10 "successes" in one direction, to

        find the proportion of cases that pass beyond the criterion of 9

        females.  The method of Pascal's Triangle requires more complete

        understanding of the probabilistic system than does the

        resampling simulation described above because Pascal's Triangle

        requires that one understand the entire structure; simulation

        requires only that you follow the rules of the model.

        The Quincunx

             The quincunx is more a simulation method than theoretical,

        but it may be considered "conventional".  Hence I include it here

        for completeness.


        Table of Binomial Coefficients

             The Pascal Triangle becomes cumbersome or impractical with

        large numbers - say, 17 females of 20 births - or with

        probabilities other than .5.  One might produce the binomial

        coefficients by algebraic multiplication, but that, too, becomes

        tedious even with small sample sizes.  One can also use the pre-

        computed table of binomial coefficients found in any standard

        text  But the probabilities for n = 10 and 9 or 10 females are

        too small to be shown.        Binomial Formula

             For larger sample sizes, one can use the binomial formula.

        The binomial formula gives no deeper understanding of the

        statistical structure than does the Triangle, but it does yield a

        deeper understanding of the pure mathematics.)   With very large

        numbers, even the binomial formula is cumbersome.

        The Normal Approximation

             When the sample size becomes too large for any of the above

        methods, one can then use the Normal approximation, which yields

        results close to the binomial (as seen very nicely in the output

        of the quincunx).  But to employ the Normal distribution one

        requires an estimate of the standard deviation, which can be

        derived either by formula or by resampling. (See a more extended

        parallel discussion in Chapter 00 on confidence intervals for an

        election poll.)

             The desired probability can be obtained from the Z formula

        and the a standard table for the Normal distribution found in

        every elementary text.

             The Z table can be made less mysterious if one generates it

        with simulation, or with graph paper or Archimedes' method, using

        as raw material (say) five "continuous" (that is, non-binomial)

        distributions, many of which are skewed:  1) Draw samples of

        (say) 50 or 100.  2) Plot the means to see that the Normal shape

        is the outcome.  Then 3) standardize with the standard deviation

        by marking the standard deviations onto the histograms.

             The aim of the above exercise and the heart of the

        conventional parametric method is to compare the sample result -

        the mean - to a standardized plot of the means of samples drawn

        from the universe of interest to see how likely it is that that

        universe produces means deviating as much from the universe mean

        as does our observed sample mean.  The steps are:

             1. Establish the Normal shape - from the exercise above, or

        from the quincunx or Pascal's Triangle or the binomial formula or

        the formula for the Normal approximation or some other device.

             2.  Standardize that shape in standard deviations.

             3.  Compute the Z score for the sample mean - that is, its

        deviation from the universe mean in standard deviations.

             4.  Examine the Normal (or really, tables computed from

        graph paper, etc.) to find the likelihood of a mean being that

        far by chance.

             This is the canon of the procedure for most parametric work

        in statistics.  For some small samples, accuracy is improved by

        using the t distribution, a matter discussed in Chapter 00.


                       CHOICE OF THE BENCHMARK UNIVERSE<1>

             In the example of the ten calves, the choice of a benchmark

        universe - a universe that (on average) produces equal

        proportions of males and females -  seems rather straightforward

        and even automatic, requiring no difficult judgments. But in

        other cases the process requires more judgments to be made.

             Let's consider another case where the choice of a benchmark

        universe requires no difficult judgments.  Assume the U.S.

        Department of Labor's Bureau of Labor Statistics takes a very

        large sample - say, 20,000 persons - and finds a 10 percent

        unemployment rate.  At some later time another but smaller sample

        is drawn - 2,000 persons - showing an 11 percent unemployment

        rate.  Should BLS conclude that unemployment has risen, or is

        there a large chance that the difference between 10 percent and

        11 percent is due to sample variability?  In this case, it makes

        rather obvious sense to ask how often a sample of 2,000 drawn

        from a universe of 10 percent unemployment (ignoring the

        variability in the larger sample) will be as different as 11

        percent just due to sample variability?  This problem differs

        from that of the calves only in the proportions and the sizes of

        the samples.

             Let's change the facts and assume that a very large sample

        had not been drawn and only a sample of 2,000 had been taken,

        indicating 11 percent unemployment.  A policy-maker asks the

        likelihood that unemployment is above ten percent.  It would

        still seem rather straightforward to ask how often a universe of

        10 percent unemployment would produce a sample of 2000 with a

        proportion of 11 percent unemployed.

             Still another problem where the choice of benchmark

        hypothesis is relatively straightforward:  Say that BLS takes two

        samples of 2000 persons a month apart, and asks whether there is

        a difference in the results. Pooling the two samples and

        examining how often two samples drawn from the pooled universe

        are as different as are observed seems obvious.

             One of the reasons that the above cases - especially the

        two-sample case - seems so clearcut is that the variance of the

        benchmark hypothesis is not an issue, being implied by the fact

        that the samples deal with proportions.  If the data were

        continuous, however, this issue would quickly arise.  Consider,

        for example, that the BLS might take the same sorts of samples

        and ask unemployed persons the lengths of time they had been

        employed.  Comparing a small sample to a very large one would be

        easy to decide about.  And even comparing two small samples might

        be straightforward - simply pooling them as is.

             But what about if you have a sample of 2,000 with data on

        lengths of unemployment spells with a mean of 30 days, and you

        are asked the probability that it comes from a universe with a

        mean of 25 days?  Now there arises the question about the amount

        of variability to assume for that benchmark universe.  Should it

        be the variability observed in the sample?  That is probably an

        overestimate, because a universe with a smaller mean would

        probably have a smaller variance, too.  So some judgment is

        required; there cannot be an automatic "objective" process here,

        whether one proceeds with the conventional or the resampling

        method.

             The example of the comparison of liquor retailing systems in

        Chapter 00 provides more material on this subject.


          THE CONCEPT OF STATISTICAL SIGNIFICANCE IN TESTING HYPOTHESES

             Hypothesis tests using the concept of significance have been

        misused almost since their origin; the flaws were pointed out

        early on by my friend and editor, Hanan Selvin, and since then

        have been discussed often and so well that no discussion is

        needed here.  This section offers only an interpretation of the

        meaning of "significant" in connection with the logic of

        significance tests.

                  1. Consider the nine-year-old who tells the teacher
        that the dog ate the homework.  Why does the teacher not accept
        the excuse?  Clearly it is because the event would be too
        "unusual".  But why do we think that way?

              Let's speculate that you survey a million adults, and only

        three report that they have ever heard of a real case where a dog

        ate somebody's homework.  You are a teacher, and a student comes

        in without homework and says that a dog ate the homework.  It

        could have happened -- your survey reported that it really has

        happened in three lifetimes out of a million.  But it does not

        happen very often.

             Therefore, you probably conclude that because the event is

        so unlikely, something else must have happened -- for example,

        that the student did not do the homework.  The logic is that if

        something seems very unlikely, it would therefore surprise us

        greatly if it were to actually happen, and therefore we assume

        that there must be a better explanation. This is why we look

        askance at unlikely coincidences when they are to someone's

        benefit.

             This is the logic of John Arbuthnot's hypothesis test about

        the ratio of births by sex in the first published hypothesis test

        (see Chapter 00), though his extension of his logic to God's

        design goes beyond the standard modern framework.  It is also the

        implicit logic in the research on puerperal fever, cholera, and

        beri-beri, the data for which were shown in Chapter 00, though no

        explicit mention was made of probability in those cases.

             2)  Two students sit next to each other at an examination.

        Out of a hundred questions each student gets 82 right, and each

        of the mistakes that they make is on the same questions.  Do you

        believe that the students cheated?

             You say to yourself:  It would be most unlikely that they

        would have made the same mistakes by chance -- and you can

        compute how unlikely it would be -- and because it is so unlikely

        you therefore are likely to believe that they cheated.

             3)  The court is hearing a murder case.  There is no eye-

        witness, and the evidence consists of such facts as the height

        and weight and age of the person charged, and other

        circumstantial evidence.  Only one person in 50 million has such

        characteristics, and you find such a person.  Will you convict

        the person, or will you assume that the evidence might have

        occurred just by chance?  Of course it might have occurred by bad

        luck, but the probability is very very small.  Will you therefore

        conclude that because the chance is so small, it is reasonable to

        assume that the person charged committed the crime?

             Sometimes the unusual really happens - the court errs by

        judging that the wrong person did it, and that person goes to

        prison or even is executed.  The best we can do is to make the

        criterion strict:  "Beyond a reasonable doubt".  (People ask:

        What probability does that criterion represent?  But the court

        will not provide a numerical answer.)

             4)  Somebody says to you:  I am going to deal out five cards

        and it will be a royal flush - ten, jack, queen, king, and ace of

        a given suit.  The person deals the cards and the royal flush

        appears.  Do you think the occurrence happens just by chance?

        No, you are likely to be very dubious that it happened by chance.

        Then you believe there must be some other explanation -- that the

        person fixed the cards, for example.

             Note:  You don't attach the same meaning to any other

        permutation, even though it is as rare -  unless the person

        announced it in advance.

             Indeed, even if the person says nothing, you will be

        surprised at a royal flush, because this hand has meaning,

        whereas another given set of five cards do not.

             Two important points complicate the concept of statistical

        significance:

             1.  With a large enough sample, every treatment or variable

        will seem different from every other.   Two faces of even a good

        die will produce different results in the very long run.  Other

        statistics help interpret these results - for example, the beta

        coefficient or the partial regression coefficient (see Chapter

        00).

             2.  Statistical significance does not imply economic or

        social significance.  Two faces of a die may be statistically

        significant in a huge sample of throws, but a 1/10,000 difference

        is too small to make an economic difference in betting.

        Statistical significance is only a filter.  If it appears, one

        should then proceed to decide whether there is substantive

        significance.

             Interpreting significance is sometimes complex, especially

        when the interpretation depends heavily upon your prior

        expectations - as it often does.  For example, how should a

        basketball coach decide whether or not to bench a player for poor

        performance after a series of missed shots at the basket?

             Consider Coach John Thompson who, after Charles Smith missed

        10 of 12 shots in the 1989 Georgetown-Notre Dame NCAA game, took

        Smith out of the game for a time  (The Washington Post, March 20,

        1989, p. C1).  The scientific or decision problem is:  Should the

        coach consider that Smith is not now a 47 percent shooter as he

        normally is, and therefore bench him?  The statistical question

        is:  How likely is a shooter with a 47 percent average to produce

        10 of 12 misses?

             Would Coach Thompson take Smith out of the game after he

        missed one shot?  Clearly not. Why not?  Because one "expects"

        Smith to miss a shot half the time, and missing one shot

        therefore does not seem unusual.

             How about after Smith misses two shots in a row?  For the

        same reason the coach still would not bench him, because this

        event happens "often" -- more specifically, about once in every

        sequence of four shots.

             How about after 9 misses out of ten shots?  Notice the

        difference between this case and 9 female calves of ten.  In the

        case of the calves we expected half females because the

        experiment is a single isolated trial.  The event considered by

        itself has a small enough probability that it seems unexpected

        rather than expected.  "Unexpected" seems to be closely related

        to "happens seldom" or "unusual" in our psychology.  And an event

        that happens seldom seems to call for explanation, and also seems

        to promise that it will yield itself to explanation by some

        unusual concatenation of forces.  That is, unusual events lead us

        to think that they have unusual causes; that is the nub of the

        problem.  (But on the other hand, one can sometimes benefit by

        paying attention, as scientists know when they investigate

        outliers.)

             In basketball shooting, we expect 47 percent of Smith's

        individual shots to be successful, and we also expect that

        average for each set of shots.  But we also expect some sets of

        shots to be far from that average because we observe many sets;

        such variation is inevitable.  So when we see a single set of 9

        misses in ten shots, we are not very surprised.

             But how about 29 misses in 30 shots?  At some point, one

        must start to pay attention.  (And of course we would pay more

        attention if beforehand, and never at any other time, the player

        said, "I can't see the basket today.  My eyes are dim".)

             So, how should one proceed?  Perhaps the same way as with a

        coin that keeps coming down heads a very large proportion of the

        throws, over a long series of tosses:  At some point you examine

        it to see if it has two heads.  But if your investigation is

        negative, in the absence of an indication other than the behavior

        in question, you continue to believe that there is no explanation

        and you assume that the event is "chance" and should not be acted

        upon.  In the same way, a coach might ask a player if there is an

        explanation for the many misses.  But if the player answers "no",

        the coach should not bench him.  (There are difficulties here

        with truth-telling, of course, but let that go for now.)

             The key point for the basketball case and other repetitive

        situations is not to judge that there is an unusual explanation

        from the behavior of a single sample alone, just as with a short

        sequence of stock-price changes.

             We all need to learn that "irregular" (a good word here)

        sequences are less unusual than they seem to the naked intuition.

        A streak of 10 out of 12 misses for a 47 percent shooter occurs

        about every 3 percent of the time.  That is, about every 33 shots

        Smith takes, he will begin a sequence of 12 shots that will end

        with 3 or fewer baskets - perhaps once in every couple of games.

        This does not seem "very" unusual, perhaps.  And if the coach

        treats each such case as unusual, he will be losing some of the

        services of a better player than he replaces him with.

             In brief, How hard one should look for an explanation should

        depend on the likelihood of the event.  But one should (almost)

        assume the absence of an explanation unless one actually finds

        it.

             Bayesian analysis could be brought to bear upon the matter,

        bringing in your prior probabilities based on knowledge that

        research has shown that there is no such thing as a "hot hand" in

        basketball, together with some sort of cost-benefit error-loss

        calculation comparing Smith and next best available player.

             The "data-dredging" issue was discussed in the context of

        the doctors' smoking by states in Chapter 00.




                                     ENDNOTE





        **ENDNOTES**

        <1>: This is one of many issues that Peter Bruce first raised,

        and whose treatment here reflects back-and-forth discussion

        between us.