CHAPTER III-6

               EXPERIMENTATION, SAMPLE SPACE ANALYSIS, SIMULATION,

           AND FORMULAIC THEORY:  THEIR NATURES AND INTERRELATIONSHIPS


             About 1615, Italian gamblers brought to Galileo Galilei a

        problem in the game of three dice.  The theorists of the day had

        figured as equal the chances of getting totals of nine and ten

        points (also eleven and twelve), because there are the same

        number of ways (six) of making those points -- for example, nine

        can be "126," "135," "144," "234," "225," and "333."  But players

        had found that in practice ten is made more often than nine.

             The chances of rolling nine and of rolling ten with three

        dice is considered by historians to be the first problem in

        probability analyzed in the "modern" era that began with Galileo.

        Let's use it to explore the relationships among the various

        methods of estimating those probabilities using a)

        experimentation, b) sample space analysis, c) simulation, and d)

        formulaic theory.

             You are standing in the warehouse of a playing-card factory

        that has been hit by a tornado.  Cards are scattered everywhere,

        some not yet wrapped and others ripped out of their packages.

        The factory makes a variety of decks - for poker without a joker,

        poker with a joker, and pinochle; magician's decks; decks made of

        paper and others of plastic; cards of various sizes; and so on.

             Two hours from now a friend will join you for a game of

        near-poker with these cards. Each hand will be chosen as randomly

        as possible from the huge heap of cards, and then burned. What

        odds should you attach to getting the combination two-of-a-kind -

        two cards of different or the same suit but of the same number or

        picture - in a five-card draw?

             Ask this question of a professional probabilist or

        statistician, and - based on the small sample I have taken - s/he

        is likely to say "I don't have enough information".  There is

        even a name for this sort of question: Problems lacking

        structure.

             Ask the same question of a class of high-school students or

        college freshmen and you will quickly get the suggestion, "Draw

        hands from the card pile the same way you will draw them when you

        play later, and see how often you get two-of-a-kind".

             Who produces the better (more useful) reply - the "naive"

        students, or the learned statistician/probabilist?

             (If the question had been framed as the probability of

        getting [say] the jack of spades in a poker hand drawn from the

        pile, the probabilist probably would think of suggesting a

        sample.  Apparently it is the combination of elements that leads

        the trained person to say that the job cannot be done.)

             This case reminds one of the three-door problem, in which

        resampling immediately produces the correct answer whereas

        trained intellects almost uniformly arrive at the wrong

        answer.

             The untutored person's try-it procedure is, in this case,

        not only as good as any procedure can be, but better than any

        formal procedure can be, even in principle.  One reason is that

        the probability of any given hand in the warehouse is affected by

        the physical properties of the cards - their sizes and materials.

        The various cards are not perfectly alike, just as a die cannot

        be perfectly true; even a bit of purposeful shaving of a die's

        edge can affect the odds enough to enable a gambler to cheat

        successfully.)  But an empirical estimation with an actual

        sample-and-deal procedure includes the effect of these physical

        influences, whereas any more abstract approach has great

        difficulty doing so.

             Another issue: You might also want to estimate the chance of

        a three-of-a-kind hand. You quickly recognize that this event

        does not happen very often, and it will take many hands to

        estimate its probability.  So you consider this procedure: take a

        sample of (say) 1000 cards, record their values, transform those

        values to a form that a computer can read, then program the

        computer to choose (with replacement, now) five cards at random

        from the 1000, and examine many trial hands (say 10,000) to see

        whether there are three-of-a-kind.  The computer procedure should

        be as close an analog as possible to physically shuffling and

        dealing five-card hands from the 1000 sampled cards.  Please

        notice that one need never know how many of each type (that is,

        face value) of card the sample contains.  Rather, as each of the

        cards is examined, its value is transmitted to the computer.  It

        is unnecessary to calculate any sample space or any partition of

        it; one never needs to know that there are 2,598,9600 or whatever

        number of possible poker hands. (Goldberg, 1960, p. 305)

             A probabilist might suggest computing the chance of three-

        of-a-kind from the same 1000 pieces of information by using

        probability theory.  Both these procedures will arrive at much

        the same result.  Both fail to take account of physical factors -

        size, and type of material - that might affect physical trials

        with the 1000 sampled cards.  The simulation will be slightly

        less "exact" than the theoretical calculation, the lesser

        exactness being made as small as desired by increasing the number

        of computer trials; the loss of accuracy surely will be very

        small relative to the sampling error deriving from choosing the

        1000 cards from the huge pile - including both the random-

        sampling error and the bias due to not drawing the sample

        randomly.  And of course the formal calculation in this case will

        be quite tricky and prone to error.  It must assess the size of

        the sample space of three-of-a-kind hands when the numbers of

        cards of various values will differ, both because the factory

        makes different numbers (no jacks, queens, and kings in some

        decks, for example) as well as because of the inaccuracy due to

        the sample of only 1000 cards.  In contrast, the sample space

        need never be known for physical or computer resampling.


                    EXPLANATION OF THE ADVANTAGE OF RESAMPLING

        Lighter Conceptual Burden

             In general, the conceptual burden in resampling is much

        slighter than in probability theory; this is one of resampling

        main advantages.  One does not need to be able to add or even to

        count in order to conduct individual experimental trials.  One

        only needs to know the concept of counting, and also the concept

        of a ratio, so as to (first) keep a record of the numbers of

        successful and unsuccessful trials, and (second) add to get the

        total trials and dividing to get the ratio of successful to

        total. Certainly the discipline that applauds the likes of Peano,

        Russell, and Whitehead for boiling down mathematics to its most

        fundamental elements should have some appreciation for an

        intellectual method that gets along so successfully with so

        little recourse to higher abstractions.

             Consider, for example, the case of the probabilities of

        various numbers of points when throwing two dice (refer to

        Goldberg, 1960, p. 158ff).  When specifying the sample space,

        etc., one needs to add the two top faces of the dice to determine

        the range of the function.  With simulation it is not necessary

        to ever determine this range; one simply tosses the two dice and

        inspects the outcomes.  One can ask the probability of getting

        "13" (or any other number) and get an answer experimentally

        without knowing the range in advance.


        Reducing the Extent of Abstraction from Actual Experience

             Robert Shannon, in a book on Systems Simulation, constructs
        a continuum from "Physical models" to "Scaled models" to "Analogy
        models" to "Computer simulation" to "Mathematical models" (1975,
        p. 8).  (I would add experimentation with the actual material of
        interest as a stage even less abstract than Physical models.)  At
        each successive stage of translation to greater abstraction one
        runs the risk of losing some important aspect of experiential
        reality, and of introducing misleading assumptions and
        simplifications.  This argues for abstracting as little as
        possible, doing so only to the extent that it is necessary.

             As Shannon's continuum suggests, simulation methods in

        statistics (with or without a computer) are less abstract than

        are distributional and formulaic methods, and they should be less

        at risk of error.  This speculation jibes with the experimental

        evidence that people can attain more correct answers to numerical

        problems with resampling methods than with formulaic methods,

        when given the equal amounts of instruction (Simon, Atkinson, and

        Shevokas, 1976).

             Of course the optimal level of abstraction depends upon the

        circumstances.  If one wants to estimate the probability of a

        given sum with four dice in order to maximize one's chance of

        winning with those particular dice, experimenting with those very

        dice is likely to be optimum, but if one wants to know the odds

        with four dice in other circumstances, a more abstract approach

        may be better.  However, there are very few circumstances in

        which the formulaic and distributional abstractions are likely to

        be better than Monte Carlo methods (lack of data being one such

        circumstance, and low probability being another).


        Operationalizing the Problem

             A third virtue of resampling may be stated as:  If you

        understand the posing of the problem operationally, you

        automatically will obtain the correct answer.  For example,

        consider this probability puzzle from Lewis Carroll's Pillow

        Problems (by way of Martin Gardner, correspondence, May,

        1993):

                  A bag contains one counter, known to be either
             white or black.  A white counter is put in, the bag
             shaken, and a counter drawn out, which proves to be
             white.  What is now the chance of drawing a white
             counter?

             The issue is, do I state the problem correctly in steps 1-4

        below?  If I do, that implies that the repetition of the process

        in those steps will lead to a correct answer to the problem.

             1.  Put a white counter (later have the computer call it "7"

        to avoid confusion) or a black counter (call it "8") in the urn

        with probability .5.

             2.  Put in a white and shuffle.

             3.  Take out a counter.  If black, stop.

             4.  (If result of (3) is white):  Take out the remaining

        counter, examine, and record its color.

             5.  Repeat steps 1-4 (say) 1000 times.

             6.  Compute how many trials yielded a white first.

             7.  Count the number and compute the proportion of whites

        ("7s") among the "white first" trials.

             The benefits of the operationalization of problems that

        occurs with simulation can be seen in a different way in another

        problem of Lewis Carroll's:

             Given that there are 2 counters in a bag, as to which
             all that was originally known was that each was either
             white or black.  Also given that the experiment has
             been tried a certain number of times, of drawing a
             counter, looking at it, and replacing it; that it has
             been white every time...What would then be the chance
             of drawing white? (p. 15).

             This problem was an eye-opening experience for me.  First I

        wrote down a set of steps to handle the problem with white and

        black balls ("counters").  But I did not actually execute the

        procedure.  Instead, while I was waiting for an associate to

        write a computer program to solve the problem, following the

        steps I had outlined, I set out to explain the problem logically.

        I wrote five nice pages of what I thought to be clear

        explanation.

             A few days later I reread the steps I had written down.  But

        now I found that I could not understand the logic.  This

        experience shows how easy it is to get confused with Bayesian

        problems of this sort if one works analytically rather than with

        simulation.  So I tried harder to create a simulation - and

        harder - and harder.  And then I found that I simply could not

        create a simulation that would model the problem as Carroll wrote

        it ( and as I understood it).  Apparently I was as confused as

        anyone could be.

             What to do?  I decided to go back to my very basic

        principle:  There must be a way to physically model every

        meaningful question in probability and statistics.  If one cannot

        find a way to model a simulation for the problem, maybe there is

        something wrong with the problem rather than with my modeling.

        And indeed, when we examine it closely, we may see that Carroll's

        problem is not operational and hence not meaningful.

             The difficulty turns out to lie in Carroll's phrase "given
        that the experiment has been tried a certain number of times, of
        drawing a counter, looking at it, and replacing it; that it has
        been white every time".  In Carroll's solution he indicates that
        he believes that it is possible to infer a probability for the
        next trial on the basis of a series of trials that are all
        successes.  This is a famous formula in probability theory - that
        the probability is n/(n+1), where n is the number of observed
        successes.  But probability theorists such as Feller have argued
        (correctly, in my view) that this formula is not meaningful.  And
        the fact that it is not possible to model the formula
        meaningfully in this context confirms that theoretical analysis.

             So once again the act of attempting to create an operational

        simulation of a problem and then actually executing the procedure

        has kept our feet on solid ground and off the slippery slope into

        confusion or meaninglessness.


                         LIMITS OF THE RESAMPLING METHOD

        Low Probabilities

             Can the formal method be better in any respect?  Yes, it

        can.  If you want to estimate the chance of a royal flush in

        poker, which probably would happen only once in hundreds of

        thousands or millions of trial hands, taking samples by sitting

        on the floor of the warehouse for a few hours and dealing hands

        will not produce a sound estimate.  And even computer sampling

        might be much less accurate than analysis without an inordinate

        amount of computer time devoted to the problem.

             But will the formal method surely be better for the royal

        flush?  No.  There is an excellent chance that anyone except a

        very skilled probabilist will use the wrong calculating formula,

        and the erroneous answer might well be worse than no answer at

        all, and worse than computer sampling or perhaps even sampling by

        hand.  This realistic possibility of conceptual analytic error

        cannot be ignored in any practical situation.  It is as much a

        source of possible error as the sampling procedure, physical

        characteristics of the cards, and unsound computer programming if

        a computer is used.  Just as with the calculation of the

        possibility of a disaster at a nuclear reactor, each possible

        source of trouble must be gauged and allowed for in proportion to

        its likely importance.  None can be dismissed as being avoidable

        "in principle" by proper handling.


        Small Samples

             Imagine a sample of the heights of four persons.  You wish

        to estimate a confidence interval for the population mean or

        median.  It is rather obvious that the interval should go beyond

        the range of the four observations, but a resampling procedure

        will never give that result.  Does this mean that resampling is

        inferior here to the conventional method using (say) the t test?

             Implicit in the conventional method is an assumption about

        the shape of the distribution.  Making this assumption is in no

        way different in principle from a Bayesian prior.  And the nature

        of the assumption is crucial.  An assumption that would be

        appropriate for heights would not be appropriate for incomes.

             Once we have established that it is necessary to bring

        outside information and judgment to bear, we can then consider

        doing so with the resampling method as well as the conventional

        method. We need not enter into technical details here, but there

        are many possible ways to coordinate the observations to any

        shape of distribution in such fashion as to estimate its

        dispersion, and then to draw samples from the distribution to

        estimate confidence limits.  This would not seem inferior to the

        conventional method.  And if one made the assumption of a

        peculiar type of distribution, the advantage would seem to be

        with the resampling method, though this subject needs more

        exploration.


                              WHAT ABOUT "USUALLY"?

             The title of this article says that the formal method is

        "usually" inferior.  This assertion assumes that most

        applications of probability and statistics deal with situations

        and probabilities that lend themselves well to direct physical

        sampling and/or to the resampling procedure on the computer.

        This very general assertion, of course, might be refuted by

        systematically-gathered evidence.  What is most important,

        however, is not the general assertion but rather choosing the

        method that is right for each particular situation.

             The card-warehouse example lacks realism.  But estimating

        the probability that there will be two faults in a particular

        piece of machine output, where the probability of each fault

        seems to be independent of each other, is not very dissimilar,

        though the probability model is rather different.  And a quite

        analogous realistic set of problems was the basis for Galileo's,

        and then Pascal's and Fermat's, foundational work with dice games

        in formal probability theory that proceeded by assessing the

        sample space and partitions of it.  But experimentally estimating

        the odds as gamblers previously had done had led to sounder

        answers than even such great minds as Gottfried Leibniz had

        arrived at with deductive methods (cited by Hacking, 1975, p.

        52).

             Why argue that formal methods are often inferior in

        principle?  One of the objections to resampling in statistics is

        that it is "only" an imperfect substitute for formal methods, and

        that the passage to formal methods represents an advance over

        simulation methods.  For example, when William Kruskal compared

        the early statement of resampling methods in the stark terms of

        the necessary operational procedures, versus developments in the

        literature later on, he dismissed the importance and value of the

        former by saying that the latter embodies "real mathematics"

        (personal correspondence, 1984).

             There is an important analog between the lack of exactness

        in resampling and the movement in modern physics and mathematics,

        since Poincare and Bohr, away from Newtonian deterministic

        analysis of closed systems and toward non-deterministic analysis

        of open systems.  (See Ekeland, 1988, for an illuminating

        discussion of this movement.)  Probability theory is a set of

        exact closed-form replicas of inexact open physical situations,

        of which the card warehouse is an example.  (A sample of 1000

        cards taken from the warehouse, and then converted to equally-

        weighted entities converts the open system to a closed system.)

        That is, when calculating the probability of two-of-a-kind in a

        poker hand, the sample space and the partition containing that

        subset are exact numbers even though in any actual situation

        there are incalculable elements such as the different weights of

        the cards due to the different amounts of ink on them, their

        slightly different sizes, and so on.

             I am not criticizing the exact model for not being an

        inexact replica, any more than a photograph should be criticized

        for not being a perfect replica of the scene it portrays.  But to

        claim that the photograph is a truer form than is the scene

        itself, or to claim that probability theory is more exact than a

        physical manipulation which is the very subject of interest -

        that is, to claim that the calculation of getting a pair of "2s"

        with two given dice is more exact than a million throws of the

        same two dice - is hardly supportable.

             The probabilist will reply that the calculation does not

        refer to a particular pair of dice.  But the scientist and the

        decision maker are always interested in some particular physical

        reality - a given comet, or the price of corn tomorrow - and if

        probability theory is to be judged in other than by an esthetic

        test, it must be judged on its helpfulness in these particular

        situations.

             In contrast, resampling - especially physical experiments

        with the elements whose that constitute the situation to be

        estimated - is inescapably inexact.  It is ironic that it is

        criticized for that mirroring of reality.



        page 1 statphil stheor2   for web

                                    REFERENCES


             Ekeland, Ivar, Mathematics and the Unexpected (Chicago:  U.

        of Chicago Press, 1988)

             Feller, William, An Introduction to Probability Theory and

        Its Applications (New York: Wiley, 1950)

             Goldberg, Samuel, Probability - An Introduction (New York:

        Dover Publications, Inc., 1960).

             Hacking, Ian, The Emergence of Probability New York: Cam-

        bridge U. P., 1975, pp. 166-171

             Shannon, Robert, Systems Simulation (Englewood Cliffs:

        Prentice-Hall, 1975).