CHAPTER III-5
        
        
               UPDATING SUBJECTIVE PROBABILITIES WITH SIMULATION: 
        
             FROM PEDAGOGY TO PRACTICE TO JEFFREY'S RULE TO PUZZLES
        
        
        
                                  INTRODUCTION
        
             The aim of this chapter is to show that simulation can be a 
        
        helpful and illuminating way to approach problems in Bayesian 
        
        analysis.  
        
             Simulation has two valuable properties for Bayesian 
        
        analysis:  1) It can provide an effective way to handle problems 
        
        whose analytic solution may be difficult or impossible.  2) 
        
        Simulation can provide insight to problems that otherwise are 
        
        difficult to understand fully, as is peculiarly the case with 
        
        Bayesian analysis.  The chapter therefore presents examples 
        
        ranging from 1) the simplest pedagogy to 2) the complexities of 
        
        updating Bayesian probabilities in statistical practice to 3) 
        
        clarifying philosophical problems such as Jeffrey's Rule, and 4) 
        
        the unmasking of a non-problem by Lewis Carroll.
        
             Philosopher Charles Sanders Peirce is paraphrased as saying 
        
        that "in no other branch of mathematics is it so easy for experts 
        
        to blunder as in probability theory" (Gardner, 1961, p. 220)1.  
        
        Even great mathematicians have blundered on simple problems, 
        
        including D'Alembert and Leibniz.  This observation is especially 
        
        true of Bayesian problems, as much recent study in cognitive 
        
        psychology has shown (for a summary, see Piattelli-Palmarini, 
        
        1994).  When psychologists employ probability puzzles showing how 
        






        people err, these puzzles almost invariably are problems in 
        
        conditional probability and Bayesian analysis (and Feller [1968, 
        
        p. 124] insists that Bayesian analysis be seen as an exercise in 
        
        Bayesian probability).  
        
             All but the simplest problems in conditional probability are 
        
        confusing to the intuition even if not difficult mathematically.    
        
        But when one tackles Bayesian and other problems in probability 
        
        with experimental simulation methods rather than with logic, 
        
        neither simple nor complex problems need be difficult for experts 
        
        or beginners.  
        
        
                        THE SIMPLEST PEDAGOGICAL PROBLEMS
        
             To make clear the nature of Bayes' rule, let us start with 
        
        the simplest sort of problem, and proceed gradually from there.
        
        1. Assessing the Likelihood That a Used Car Will Be Sound.
        
             Consider a problem in estimating the soundness of a used car 
        
        one considers purchasing  (after Wonnacott and Wonnacott, 1990, 
        
        p. 93).  Seventy percent of the cars are known to be OK on 
        
        average, and 30 percent are faulty.  Of the cars that are really 
        
        OK, the mechanic identifies 80 percent as "OK" but says that 20 
        
        percent are "faulty"; of those that are faulty, the mechanic 
        
        correctly identifies 90 percent as faulty and says (incorrectly) 
        
        that 10 percent are OK.
        
             We wish to know the probability that if the mechanic says a 
        
        car is "OK", it really is OK.
        
             One can get the desired probabilities directly by simulation 
        
        without knowing Bayes' Law, as we shall see.  But one must be 
        
        able to model the physical problem correctly in order to proceed 
        






        with the simulation; this requirement of a clearly-visualized 
        
        model is a strong point in favor of simulation.       
        
             The following steps determine the probability that a car 
        
        said to be "OK" will turn out to be really faulty:
        
             1.  Model in percentages the universe of all cars as an urn 
        
        of 100 balls.  Working from the data as given above, and 
        
        referring to first principles, color (.9 * .3 =) .27 of the 100 
        
        balls (the 27 faulty balls said to be "faulty") violet, (.1 * .3 
        
        =) .03 of the 100 balls blue (3 balls said "OK" but faulty), .2 
        
        * .7 =) .14 of the balls (14 OK cars said to be "faulty" balls) 
        
        orange, and (.8 * .7 = ) 56 balls (said to be "OK" that really 
        
        are OK) maroon.  A Venn diagram may help with this step, but it 
        
        is not necessary.
        
             Even better procedure would be to work directly from the 
        
        original data.  One would note, for example, that of 200 cars 
        
        previously observed, 54 were faulty and were said to be "faulty", 
        
        6 were faulty and were said to be "OK", 28 were OK but were said 
        
        to be "faulty", and 112 were OK and were said to be "OK".  Then 
        
        make an urn of 54 violets, 6 blue, 28 orange, and 112 maroon.1
        
             2.  Draw a ball.  If it is one of those said to be "faulty" 
        
        - that is, violet or orange - draw (with replacement) another 
        
        ball.  Continue until obtaining a ball said to be "OK" - that is, 
        
        a blue or maroon ball.  Then record its color.
        --------------------

             1. The use of percentages rather than raw numbers in 
        Bayesian problems is an unnecessary abstraction and is often 
        misleading, in addition to being a hindrance in modeling for 
        simulation.  Indeed, thinking of the prior experience as a dis-
        tribution of data rather than as a probability distribution is 
        both closer to the facts and less confusing in complex situa-
        tions, as will be seen in a later example. 






        
             3.  Repeat step 2 perhaps 1000 times and compute the 
        
        proportion of blues among the recorded results.
        
                                       OR
        
             1.  Choose a number randomly between "1" and "100".
        
             2.  If "28-30" or "31-45:, record; otherwise draw another 
        
        number, and repeat until a number is recorded.
        
             3.  Repeat step 2 and count the proportion "28-30" among the 
        
        total "28-45".
        
             The key modeling step is excluding a trial from 
        
        consideration (without making any record of it) if it produces an 
        
        irrelevant observation, and continuing to do so until the process 
        
        produces an observation of the sort about which you are presently 
        
        inquiring.
        
             Using RESAMPLING STATS, an answer may be produced as 
        
        follows:
        
        
        "01 - 27"  = actually faulty, said to be "faulty"
        "28 - 30  = faulty, "OK"
        "31 - 86"  = OK, "OK"
        "87 - 100" = OK, "faulty"
        
        REPEAT 1000             do 1000 repetitions
          GENERATE 1 1,100 a    generate a number between "1" and "100"
          IF a between 28 86    if it's between "28" and "86" (those that
                                    say "good")
            SCORE a z           score this number
          END                   end the IF condition
        END                          end REPEAT loop
        COUNT z between 28 30 k  how many of the SCORED numbers were
                                         between "28 - 30" (faulty, "OK")
        SIZE z s                      how many numbers were scored
        DIVIDE k s kk                 what proportion were faulty, "OK"
        PRINT kk                      print result
        
        Result kk       =   0.039
        
        2.  Estimating Driving Risk for Insurance Purposes
        
             Another sort of introductory problem, following after Feller 






        
        (1968, p. 22):  
        
             A mutual insurance company charges its members 
             according to the risk of having an auto accident.  It 
             is know that there two classes of people - 80 percent 
             of the population with good driving judgment and with a 
             probability of .06 of having an accident each year, and 
             20 percent with poor judgment and a probability of .6 
             of having an accident each year. The company's policy 
             is to charge (in addition to a fee to cover overhead 
             expenses) $100 for each percent of risk, i. e. a driver 
             with a probability of .6 should pay 60*$100 = $6000. 
        
                  If nothing is known of a driver except that he had 
             an accident last year, what fee should he pay?  
        
             This procedure will produce the answer:  
        
             1.  Construct urn A with 6 red and 94 green balls, and urn B 
        
        with 60 red and 40 green balls. 
        
             2.  Randomly select an urn with probabilities for A = .8 and 
        
        B = .2, and record the urn chosen.  
        
             3.  Select a ball at random from the chosen urn.  If the 
        
        ball is green, repeat this step; if red, continue to step 4. In 
        
        either case, replace the ball selected.
        
             4.  Select another ball from the urn chosen in step 2.  If 
        
        it is red, record "Y", if green, record "N".  
        
             5.  Repeat steps 2 - 4 perhaps 1000 times, and determine the 
        
        proportion "Y".  The final answer should be approximately 42*$100 
        
        = $4200.  
        
        3. Screening for Disease
        
             This is a classic Bayesian problem quoted by Tversky and 
        
        Kahnemann, (1982, pp. 153-154, from Cascells, Schoenberger, and 
        
        Grayboys, 1978, p. 999):  
        
             If a test to detect a disease whose prevalence is 
             1/1000 has a false positive rate of 5%, what is the 
             chance that a person found to have a positive result 
             actually has the disease, assuming you know nothing 






             about the persons's symptoms or signs?
        
             Tversky and Kahnemann note that among the respondents - 
        
        students and staff at Harvard Medical School, "The most common 
        
        response, given by almost half of  the participants, was 95%" - 
        
        very much the wrong answer.
        
             To obtain an answer by simulation, we may rephrase the 
        
        question above with (hypothetical) absolute numbers as follows:  
        
             If a test to detect a disease whose prevalence has been 
             estimated to be about 100,000 in the population of 100 
             million persons over age 40 (that is, about 1 in a 
             thousand) has been observed to have a false positive 
             rate of 60 in 1200, and never gives a negative result 
             if a person really has the disease, what is the chance 
             that a person found to have a positive result actually 
             has the disease, assuming you know nothing about the 
             persons's symptoms or signs?
        
             If the raw numbers are not available, the problem can be 
        
        phrased in such terms as "about 1 case in 1000" and "about 5 
        
        false positives in 100 cases".)   
        
             One may obtain an answer as follows:
        
             1.  Construct urn A with 999 white beads and 1 black bead, 
        
        and urn B with 95 green beads and 5 red beads.  A more complete 
        
        problem that also discusses false negatives would need a third 
        
        urn.  
        
             2.  Pick a bead from urn A.  If black, record "T", replace 
        
        the bead, and end the trial.  If white, continue to step 3.
        
             3.  If a white bead is drawn from urn A, select a bead from 
        
        urn B.  If red, record "F" and replace the bead, and if green 
        
        record "N" and replace the bead.  
        
             4.  Repeat steps 2-4 perhaps 10,000 times, and in the 
        
        results count the proportion of "T"s to ("T"s plus "F"s) ignoring 
        
        the "N"s). 






        
             Of course 10,000 draws would be tedious, but even after a 
        
        few hundred draws a person would be likely to draw the correct 
        
        conclusion that the proportion of "T"s to ("T"s plus "F"s) would 
        
        be small.  And it is easy with a computer to do 10,000 trials 
        
        very quickly.
        
             Note that the respondents in the Cascells et al. study were 
        
        not naive; the medical staff members were supposed to understand 
        
        statistics.  Yet most doctors and other personnel offered wrong 
        
        answers.  If simulation can do better than the standard deductive 
        
        method, then simulation would seem to be the method of choice.  
        
        And only one piece of training for simulation is required:  Teach 
        
        the habit of saying "I'll simulate it" and then actually doing 
        
        so.
        
        
                  FUNDAMENTAL PROBLEMS IN STATISTICAL PRACTICE
        
             Box and Tiao begin their classic exposition of Bayesian 
        
        statistics with the analysis of a famous problem first published 
        
        by Fisher (1959).  
        
             ...there are mice of two colors, black and brown.  The 
             black mice are of two genetic kinds, homozygotes (BB) 
             and heterozygotes (Bb), and the brown mice are of one 
             kind (bb).  It is known from established genetic theory 
             that the probabilities associated with offspring from 
             various matings are as [in Table 1]:
        
                  Suppose we have a "test" mouse which is black and has 
             been produced by a mating between two (Bb) mice.  Using the 
             information in the last line of the table, it is seen that, 
             in this case, the prior probabilities of the test mouse 
             being homozygous (BB) and heterozygous (Bb) are precisely 
             known, and are 1/3 and 2/3 respectively.  Given this prior 
             information, Fisher supposed that the test mouse was now 
             mated with a brown mouse and produced (by way of data) seven 
             black offspring.  One can then calculate, as Fisher (1959, 
             p.17) did, the probabilities, posterior to the data, of the 
             test mouse being homozygous (BB) and heterozygous (Bb) using 
             Bayes' theorem...






        
             We see that, given the genetic characteristics of the 
             offspring, the mating results of 7 black offspring changes 
             our knowledge considerably about the test mouse being (BB) 
             or (Bb), from a prior probability ratio of 2:1 in favor of 
             (Bb) to a posterior ratio of 64:1 against it (1973, pp. 12-
             14) .  
        
        
        
        
        
        
        
        
        
        
        
                                      TABLE 1
        
                PROBABILITIES FOR GENETIC CHARACTER OF MICE OFFSPRING
        _______________________________________________________________________
        
        Mice                      BB (black)  Bb (black)  bb (brown)
        _______________________________________________________________________
        
        BB mated with bb             0          1          0
        
        Bb mated with bb             0          1/2        l/2
        
        Bb mated with Bb             1/4        1/2        1/4
        _______________________________________________________________________
        
             Source: Box and Tiao, 1973, pp. 12-14






        
             1.  Let us begin, as do Box and Tiao, by restricting our 
        
        attention to the third line in Table 25-1, and let us represent 
        
        those results with 4 balls - 1 black with "BB" painted on it, 2 
        
        black with "Bb" painted on them, and 1 brown which we immediately 
        
        throw away because we are told that the "test mouse" is black.  
        
        The remaining 3 (black) balls are put into an urn labeled "test".
        
             2.  From prior knowledge we know that a BB black mouse mated 
        
        with a bb brown mouse will produce all black mice (line 1 in the 
        
        table), and a Bb black mouse mated with a bb brown mouse will 
        
        produce 50 percent black mice and 50 percent brown mice.  We 
        
        therefore construct two more urns, one with a single black ball 
        
        (the urn labeled "BB") and the other with one black ball and one 
        
        brown ball (the urn labeled "Bb").  We now have three urns.
        
             3.  Take a ball from urn "test".  If its label is "BB", 
        
        record that fact, take a ball (the only ball, which is black) 
        
        from the BB urn, record its color (we knew this already), and 
        
        replace the ball into the BB urn; the overall record of this 
        
        trial is "BB-black".  If the ball drawn from urn "test" says 
        
        "Bb", draw a ball from the Bb urn, record, and replace; the 
        
        record will either be "Bb-black" or "Bb-brown".
        
             4.  Repeat step 3 seven times.
        
             5.  Examine whether the record of the seven balls drawn from 
        
        the BB and Bb urns are all black; if so, record "Y", otherwise 
        
        "N".
        
             6.  Repeat steps 3-5 perhaps 1000 times.  
        
             7.  Ignore all "N" records.  Proceeding now if the result of 
        
        step 5 is "Y":  Count the number of cases which are BB and the 
        






        number which are Bb.  The proportions of BB/"Y" and Bb/"Y" trials 
        
        are the probabilities that the test mouse is BB and Bb 
        
        respectively.
        
             Creating the correct simulation procedure is not easy, 
        
        because Bayesian reasoning is very subtle - a reason it has been 
        
        the cause of much controversy for more than two centuries.  But 
        
        it certainly is not easier to create a correct procedure using 
        
        analytic tools (except in the cookbook sense of plug in and 
        
        pray).  And the difficult mathematics that underlie the analytic 
        
        method (see e. g., Box and Tiao, Appendix A1.1) make it almost 
        
        impossible for the statistician to fully understand the procedure 
        
        from beginning to end; if one is interested in insight, the 
        
        simulation procedure might well be preferred.2
        
             A computer program to speed the above steps appears in the 
        
        Appendix.  The result found with a set of 1000 repetitions 
        
        is .987.
        
        
                PROBLEMS BASED ON NORMAL AND OTHER DISTRIBUTIONS1
        
             Much of the work in Bayesian analysis for scientific 
        
        purposes treats the combining of prior distributions having 
        
        Normal and other standard shapes with sample evidence which may 
        
        also be represented with such standard functions.  The 
        
        mathematics involved often is formidable, though some of the 
        
        calculational formulae are fairly simple and even intuitive.  
        
             These problems may be handled with simulation by replacing 
        
        the Normal (or other) distribution with the original raw data 
        --------------------

             1. This section represents work done jointly with Ekaterina 
        Kamushadze and Peter C. Bruce.






        
        when data are available, or by a set of discrete sub-universes 
        
        when distributions are subjective.  
        
             Measured data from a continuous distribution present a 
        
        special problem because the probability of any one observed value 
        
        is very low, often approaching zero, and hence the probability of 
        
        a given set of observed values usually cannot be estimated 
        
        sensibly; this is the reason for the conventional practice of 
        
        working with a continuous distribution itself, of course.  But a 
        
        simulation necessarily works with discrete values.  A feasible 
        
        procedure must bridge this gulf.
        
             The logic for a problem of Schlaifer's will be only be 
        
        sketched out, to be described at length in another publication.  
        
        The procedure is rather novel, but it has not heretofore been 
        
        published and therefore must be considered tentative and 
        
        requiring particular scrutiny.
        
        An Intermediate Problem in Conditional Probability
        
             Schlaifer employs a quality-control problem for his leading 
        
        example of Bayesian estimation with Normal sampling.  A chemical 
        
        manufacturer wants to estimate the amount of yield of a crucial 
        
        ingredient X in a batch of raw material in order to decide 
        
        whether it should receive special handling.  The yield ranges 
        
        between 2 and 3 pounds (per gallon), and the manufacturer has 
        
        compiled the distribution of the last 100 batches.  
        
             The manufacturer currently uses the decision rule that if 
        
        the mean of nine samples from the batch (which vary only because 
        
        of measurement error, which is the reason that he takes nine 
        
        samples rather than just one) indicates that the batch mean is 
        






        greater than 2.5 gallons, the batch is accepted.  The first 
        
        question Schlaifer asks, as a sampling-theory waystation to the 
        
        more general question, is the likelihood that a given batch with 
        
        any given yield - say 2.3 gallons - will produce a set of samples 
        
        with a mean as great or greater than 2.5 gallons.  
        
             We are told that the manufacturer has in hand nine samples 
        
        from a given batch; they are 1.84, 1.75, 1.39, 1.65, 3.53, 1.03, 
        
        2.73, 2.86, and 1.96, with a mean of 2.08.  Because we are also 
        
        told that the manufacturer considers the extent of sample 
        
        variation to be the same at all yield levels, we may - if we are 
        
        again working with 2.3 as our example of a possible universe - 
        
        therefore add (2.3 - 2.08 =) 0.22 to each of these nine observa-
        
        tions, so as to constitute a bootstrap-type universe; we do this 
        
        on the grounds that this is our best guess about the constitution 
        
        of that distribution with a mean at (say) 2.3.
        
             We then repeatedly draw samples of nine observations from 
        
        this distribution (centered at 2.3) to see how frequently its 
        
        mean exceeds 2.5.  This work is so straightforward that we need 
        
        not even state the steps in the procedure.
        
        
        Estimating the Posterior Distribution
        
             Next we estimate the posterior distribution.  Figure 1 shows 
        
        the prior distribution of batch yields, based on 100 previous 
        
        batches.  
        
                                    Figure 1
        
             Notation:  Sm = set of batches (where total S = 100) with a 
        
        particular mean m (e. g. m = 2.1).  xi = particular observation 
        
        (e. g. x3 = 1.03).  s = the set of xi. 
        






             We now perform for each of the Sm (categorized into the 
        
        tenth-of-gallon divisions between 2.1 and 3.0 gallons), each 
        
        corresponding to one of the yields ranging from 2.1 to 3.0, the 
        
        same sort of sampling operation performed for Sm=2.3 in the 
        
        previous problem.  But now, instead of having regard to the 
        
        manufacturer's decision criterion of 2.5, we construct an inter-
        
        val of arbitrary width around the sample mean of 2.08 - say at .1 
        
        intervals from 2.03 to 2.13 - and then work with the weighted 
        
        proportions of sample means that fall into this interval.
        
             1.  Using a bootstrap-like approach, we presume that the 
        
        sub-universe of observations related to each Sm equals the mean 
        
        of that Sm - (e.g 2.1) plus (minus) the mean of the  xi (equals 
        
        2.05) added to (subtracted from) each of the nine xi, e. g. 1.03 
        
        + .05 = 1.08. For a distribution centered at 2.3, the values 
        
        would be (1.84+.22=1.96, 1.75+.22=1.87...).
        
             2.  Working with the distribution centered at 2.3 as an 
        
        example:  Constitute a universe of the values (1.84+.22=1.96, 
        
        1.75+.22=1.87...).  Here we may notice that the variability in 
        
        the sample enters into the analysis at this point, rather than 
        
        when the sample evidence is combined with the prior distribution; 
        
        this is in contrast to conventional Bayesian practice where the 
        
        posterior is the result of the prior and sample means weighted by 
        
        the reciprocals of the variances (see e.g. Box-Tiao, 1973, p. 17 
        
        and Appendix A1.1).
        
             3.  Draw nine observations from this universe (with 
        
        replacement, of course), compute the mean, and record.
        
             4.  Repeat step 2 perhaps 1000 times and plot the 
        
        distribution of outcomes.






        
             5.  Compute the percentages of the means within (say) .5 on 
        
        each side of the sample mean, i. e. from 2.03 - 2.13.  The 
        
        resulting number - call it UPi - is the un-standardized (un-
        
        normalized) effect of this sub-distribution in the posterior 
        
        distribution.
        
             6.  Repeat steps 1-5 to cover each other possible batch 
        
        yield from 2.0 to 3.0 (2.3 was just done).
        
             7.  Weight each of these sub-distributions - actually, its 
        
        UPi - by its prior probability, and call that WPi -.
        
             8.  Standardize the WPis to a total probability of 1.0.  The 
        
        result is the posterior distribution.  The value found is 2.283, 
        
        which the reader may wish to compare with a theoretically-
        
        obtained result (which Schlaifer does not give).
        
             This procedure must be biased because the numbers of "hits" 
        
        will differ between the two sides of the mean for all sub-
        
        distributions except that one centered at the same point as the 
        
        sample, but it is as-yet unknown what are the extent and 
        
        properties of this bias.  The bias would seem to be smaller as 
        
        the interval is smaller, but a small interval requires a large 
        
        number of simulations; a satisfactorily narrow interval surely 
        
        will contain relatively few trials, which is a practical problem 
        
        of still-unknown dimensions.
        
             Another procedure - less theoretically justified and 
        
        probably more biased - intended to get around the problem of the 
        
        narrowness of the interval, is as follows: 
        
             5a.  Compute the percentages of the means on each side of 
        
        the sample mean, and note the smaller of the two (or in another 
        






        possible process, the difference of the two). The resulting 
        
        number - call it UPi - is the un-standardized (un-normalized) 
        
        weight of this sub-distribution in the posterior distribution.
        
             Another possible criterion - a variation on the procedure in 
        
        5a - is the difference between the two tails; for a universe with 
        
        the same mean as the sample, this difference would be zero.  
        
             The subject of this section has only been touched on for 
        
        lack of space, but more such problems, along with facilitating 
        
        computer programs, are available upon request.
        
        
              SOLVING BAYESIAN PROBABILITY PUZZLES WITH SIMULATION
        
             Several illustrative puzzles at whose heart is conditional 
        
        probability - including the famous Monty Hall "Let's Take a 
        
        Chance" three-door problem - appeared earlier in this journal 
        
        (Simon, 1994).  Now let us consider a problem that Piattelli-
        
        Palmarini (1994) considers a canonical "illusion" in probability, 
        
        and this time it will not only be dealt with by simulation, but 
        
        the psychological difficulty of solving the problem analytically 
        
        will be set forth.  Here is Samuel Goldberg's version of the 
        
        problem that Joseph Bertrand posed early in the 19th century.
        
             "Three identical boxes each contain two coins. In one 
             box both are pennies, in [the second both are nickels, 
             and in the third there is one penny and one nickel. 
        
             A man chooses a box at random and takes out a coin.  If 
             the coin is a penny, what is the probability that the 
             other coin in the box is also a penny?"
        
             Another way to phrase the same problem - with more dramatic 
        
        detail, which apparently makes the problems more difficult:
        
              A Spanish treasure fleet of three ships was sunk at 
             sea off Mexico in the 1500s.  One ship had a trunk of 
             gold forward and another aft, another ship had a trunk 
             of gold forward and a trunk of silver aft, while a 






             third ship had a trunk of silver forward and another 
             trunk of silver aft.  A scuba diver just found one of 
             the ships and a trunk of gold in it, but she ran out of 
             air before she could check the other trunk.  On deck,  
             they are now taking bets about whether the other trunk 
             found on the same ship will contain silver or gold.  
             What are fair odds that the trunk will contain gold?
        
             These are the steps in a simulation that would answer the 
        
        question:
        
             1.  Create three urns each containing two balls labeled 
        
        "0,0", "0,1", and "1,1" respectively. 
        
             2.  Choose an urn at random, and shuffle its contents.
        
             3.  Choose the first element in the chosen urn's vector.  If 
        
        "1", stop trial and make no further record.  If "0", continue.
        
             4.  Record the second element in the chosen urn's vector on 
        
        the scoreboard.  
        
             5.  Repeat (2 - 5), and calculate the proportion "0's" on 
        
        scoreboard. 
        
             Though an analogous computer simulation is shown in the 
        
        Appendix, what makes this problem interesting is not the 
        
        comparison of computer simulation to the formulaic approach, but 
        
        rather the comparison of any simulation to ratiocination without 
        
        calculation.  The reason why pure thought alone so often leads to 
        
        the wrong answer is that this deceptively-simple problem really 
        
        is quite complex, requiring many twists and turns.
        
             These are the logical steps one may distinguish in arriving 
        
        at a correct answer with deductive logic (portrayed in Figure 2):
        
                                    Figure 2
        
             1.  Postulate three ships - Ship I with two gold chests (G-
        
        G), ship II with one gold and one silver chest (G-S), and ship 
        
        III with S-S.  (Choosing notation might well be considered one or 






        
        more additional steps.)
        
             2.  Assert equal probabilities of each ship being found.  
        
             3.  Step 2 implies equal probabilities of being found for 
        
        each of the six chests.
        
             4. Fact:  Diver finds a chest of gold.
        
             5.  Step 4 implies that S-S ship III was not found; hence 
        
        remove it from subsequent analysis.
        
             6.  Three possibilities:  6a) Diver found chest I-Ga, 6b) 
        
        diver found I-Gb, 6c) diver found II-Gc.
        
             From step 2, the cases a, b, and c in step 6 have equal 
        
        probabilities.  
        
             7.  If possibility 6a is the case, then the other trunk is 
        
        I-Gb; the comparable statements for cases 6b and 6c are I-Ga and 
        
        II-S. 
        
             8.  From steps 6 and 7: From equal probabilities of the 
        
        three cases, and no other possible outcome, p (6a) = 1/3, p (6b) 
        
        = 1/3, p (6c) = 1/3, 
        
             9.  So p(G) = p(6a) + p(6b) = 1/3 + 1/3 = 2/3.
        
             A key implication of the deservedly-famous research on 
        
        errors in probabilistic judgments of Daniel Kahnemann and Amos 
        
        Tversky (interchangeably, Tversky and Kahnemann) is that human 
        
        thinking is often unsound.  And some writers in their school of 
        
        thought assert that the unsoundness of thinking is hard-wired 
        
        into our brains; this point of view is expressed vividly in the 
        
        title of Massimo Piattelli-Palmarini's book Inevitable Illusions; 
        
        he calls the unsoundness "bias", and says that "we are 
        
        instinctively very poor evaluators of probability" (1994, p. 3, 
        






        italics in original).  
        
             Another possibility - not necessarily inconsistent with 
        
        genetic explanation - is that the reason we arrive at unsound 
        
        answers to certain types of problems is that the problems are 
        
        inherently very difficult, especially when they are tackled 
        
        without the assistance of tools, because the problems require 
        
        many steps and also because the steps often involve reversals in 
        
        the path.  Without the aid of memory aids such as paper and 
        
        pencil, and the skill of using them well, the problems are just 
        
        too difficult for most persons.  
        
             One piece of evidence against the genetic-bias explanation 
        
        is that the wrong answers to problems are not all the same; they 
        
        do not even concentrate at one end of the probability spectrum.  
        
        As the work of Kahnemann and Tversky amply shows, the errors 
        
        often are widely distributed among most or all of the simple 
        
        arithmetical combinations of the numbers involved in the 
        
        problems.  The outstanding characteristic of the answers is that 
        
        they are wrong, and not the nature of the errors.  In following 
        
        long chains of logic and assessing complex assortments of 
        
        information, our brains may be weaker than we would like, but we 
        
        need not think of our brains as twisted.
        
             The two explanations have quite different implications for 
        
        remediation, and two different remedies are offered; I suggest 
        
        resorting to simulation whereas others suggest additional 
        
        training (especially in probability theory) to improve people's 
        
        logic.  The different remedies are not necessarily connected to 
        
        the two explanations, however; I believe that the remedy I 
        
        suggest is implied by the bias explanation as well as by the 






        
        weakness explanation.
        
        
             SIMULATING PHILOSOPHICALLY-DIFFICULT BAYESIAN PROBLEMS
        
             Another role for simulation in a Bayesian context is 
        
        penetrating problems that are difficult technically or 
        
        philosophically.  This section presents two such examples.
        
        1.  Is Jeffrey's Rule of Any Use?
        
             Jeffrey's Rule is a system for updating subjective 
        
        probabilities in light of additional information when the 
        
        probabilities have not been previously quantified.  Box and Tiao 
        
        (1973, pp. 41-46) give a classic exposition, but perhaps the best 
        
        way to understand the system is from examples - an implicit 
        
        operational definition.  
        
             Diaconis and Zabell (in Bell et al., p. 271) give as an 
        
        example this problem from Whitworth (1901, pp. 167-68, Question 
        
        138):
        
             A, B, C were entered for a race, and their 
             respective chances of winning were estimated at 2/11, 
             4/11, 5/11.  But circumstances come to our knowledge in 
             favour of A, which raise his chance to 1/2; what are 
             now the chances in favour of B and C respectively?
        
                  Answer.  A could lose in two ways, viz. either by 
             B winning or by C winning, and the respective chances 
             of his losing in these ways were a priori 4/11 and 
             5/11, and the chance of his losing at all was 9/11.  
             But after our accession of knowledge the chance of his 
             losing at all becomes 1/2, that is, it becomes dimin-
             ished in the ratio of 18:11.  Hence the chance of 
             either way in which he might lose is diminished in the 
             same ratio.  Therefore the chance of B winning is now
        
                             4/11 x 11/18, or 4/18;
        
             and of C winning
        
                             5/11 x 11/18, or 5/18.
        
             These are therefore the required chances. 






        
             This problem persuades Diaconis and Zabell of the sometime-
        
        incapacity of Bayes' rule.   Yet a simulation solution to the 
        
        problem at hand seems straightforward:
        
             1. Put 2 Amber, 4 Black, and 5 Claret balls in an urn, for 
        
        the original probabilities of A, B, and C respectively.
        
             2. We wish to raise A's probability from 2/11 to 1/2 and 
        
        then find the new probabilities of B and C without changing the 
        
        relative probabilities of B and C; the necessity of making this 
        
        latter assumption (or some other; I simply follow Whitworth in 
        
        the restriction he places, rather than choosing one of my own) is 
        
        forced upon our attention when we consider changing the 
        
        composition of the marbles in the urn, and this forcing of 
        
        attention to a key issue is one of the greatest benefits of a 
        
        simulation approach.  We therefore add 7 A's to the urn to make 
        
        A's probability 9/18.  If this is not crystal-clear intuitively, 
        
        we can write formally (though understanding that the formalism is 
        
        unnecessary to the main line of the discussion):
        
             A/T = 2/11
        
             T = 11
        
             A'/(T + x) = (2 + x)/(11 + x) = 1/2 where a prime on a 
        
        variable means ex post the change.
        
             x = 7. 
        
             3.  Estimate the new probabilities of B and C by repeated 
        
        trials [though of course one could also calculate B/(T + x) = 
        
        4/18 and C/(T + x) = 5/18 to get p(B') and p(C')].
        
             The direct solution with simulation suggests that there is 
        
        no need for Jeffrey's or any other subtle analysis in this case. 
        






        One might reply that the simulation illustrates Jeffrey's 
        
        approach.  But if the simulation method brings one directly to 
        
        the solution without need for Jeffrey's analysis, what is the 
        
        benefit of Jeffrey's analysis in this case?  
        
             The point here is not to deny that the discussion by 
        
        Diaconis and Zabell of the difficulties they see in the problem 
        
        throws light on interesting philosophical and theoretical issues. 
        
        And Jeffrey's Rule may be valuable in other contexts, though 
        
        apparently it is not necessary here; identifying those contexts 
        
        would be of interest.  The point, rather, is to give one more 
        
        instance of how simulation can often (even if not always) give 
        
        simple and understandable solutions to apparently-difficult 
        
        problems.
        
             Now consider the other numerical example presented in the 
        
        same article by Diaconis and Zabell (p. 272):
        
             Suppose that in a criminal case we are trying to 
             decide which of four defendants, called a, b, c, d, is 
             a thief.  We initially think P(a)=P(b)=P(c)=P(d)=1/4.  
             Evidence is then introduced to show that the thief was 
             probably left-handed.  The evidence does not 
             demonstrate that the thief was definitely left-handed, 
             but it leads us to conclude the P(thief left-handed) 
             =.8.  If a and b are the defendants who are left-
             handed, then E1={a,b}, E2={c,d} and PH(E1)=.8, 
             PH(E2)=.2 [where H stands for the probability in light 
             of handedness].  If the only effect of the evidence was 
             to alter the probability of left-handedness -- in the 
             sense that P(A Ei)=PH((A Ei) -- then PH is obtained 
             from Jeffrey's rule as PH(a)=.4, PH(b)=.4, PH(c)=.1, 
             PH(d)=.1.  Evidence is next presented that it is 
             somewhat likely that the thief was a woman.  If the 
             female defendants are a and c, then F1={a,c}, F2={b,d}.  
             If PHS(F1)=.7 [where S stands for the probability in 
             light of sex] and Jeffrey-updating is again judged 
             acceptable, then
        
                            PEF(a)=.56,  PHS(b)=.24,
        
                            PEF(c)=.14,  PHS(d)=.06.
        






             If instead the evidence (F1,.7), (F2,.3) is presented 
             first and (E1,.8), (E2,.2) is presented second, is PSH 
             equal to PHS?  [This example] shows that in general the 
             order matters since the currently held opinion governs; 
             in this example the reader may check that the order 
             does not matter.   
        
             Now a simulation solution:
        
             1.  Put a total of 4 balls marked a, b, c, and d into an urn 
        
        for the original state of belief.
        
             2.  On the assumption - forced by our decision about which 
        
        balls to add to the urn (unless we explicitly choose to make some 
        
        other assumption) - that the relative probabilities of a:b and 
        
        c:d remain the same, add 3 a's and 3 b's to the urn, to make p(a 
        
        + b) = .8.  
        
             3.  Find the new probabilities of a, b, c and d by 
        
        experiment (or by examination of the proportions of balls in the 
        
        urn).  
        
             4.  Continuing with the evidence relevant to the sex of the 
        
        thief:  On the assumption of constant relative probabilities a:c 
        
        and b:d, and the same logic, add balls to make 14 a's, 6 b's, 56 
        
        c's, and 24 d's, which immediately produces the probabilities for 
        
        each suspect.
        
             Again the simulation procedure suffices quite well without 
        
        auxiliary logical rules.
        
             I have not yet found any reason to think that a similar 
        
        procedure would not operate successfully in other numerical (i. 
        
        e. realistic) cases. 
        
             These two examples suggest that simulation can provide both 
        
        an easy solution and considerable insight into the nature of at 
        
        least some problems hitherto addressed with Jeffrey's Rule.  
        






        Whether this is true of all such problems, or whether Jeffrey's 
        
        Rule handles problems that simulation cannot, or whether 
        
        Jeffrey's Rule provides insight over and beyond what simulation 
        
        provides in some or most cases, is at present unknown.  An answer 
        
        would require canvassing and analyzing many such problems, in a 
        
        variety of contexts.
        
        2.  A Non-Problem of Lewis Carroll
        
             This is Lewis Carroll's Pillow Problem 41 (1895/1958, pp. 9, 
        
        62, 63):
        
                  My friend brings me a bag containing four coun-
             ters, each of which is either black or white.  He bids 
             me draw two, both of which prove to be white.  He then 
             says "I meant to tell you, before you began, that there 
             was at least one white counter in the bag.  However, 
             you know it now, without my telling you.  Draw again."
        
                  (1)  What is now my chance of drawing white?
        
                  (2)  What would it have been if he had not spoken?
        
        
        Carroll gives the following solution:
        
                  (1)  As there was certainly at least one W in the 
             bag at first, the 'a priori' chances for the various 
             states of the bag, 'WWWW, WWWB, WWBB, WBBB,' were '1/8, 
             3/8, 3/8, 1/8'.
        
                  These would have given, to the observed event, the 
             chances 'I, 1/2, 1/6, O'.
        
                  Hence the chances, after the event, for the var-
             ious states, are proportional to '1/8.I, 3/8.1/2, 
             3/8.1/6'.; i.e. to '1/8, 3/16, 1/16'; i.e. to '2, 3, 
             I'.  Hence their actual values are '1/3, 1/2, 1/6'.
        
                  Hence the chance, of now drawing W, is 
             '1/3.I+1/2.1/2'; i.e. it is 7/12.
        
                                                     Q.E.F.
        
                  (2)  If he had not spoken, the 'a priori' chances 
             for the states 'WWWW, WWWB, WWBB, WBBB, BBBB', would 
             have been 'I, 4, 6, 4, I'.
                               16
        






                  These would have given, to the observed event, the 
             chances 'I, 1/2, 1/6, O, O'.
        
                  Hence the chances, after the event, for the var-
             ious states, are proportional to '1/16.I, 1/4.1/2, 
             1/6.3/8'; i.e. to 'I, 2, I'.  Hence their actual values 
             are '1/4, 1/2, 1/4'.
        
                  Hence the chance, of now drawing W, is 
             '1/4.I+1/2.1/2'; i.e. it is 1/2.
        
                                                     Q.E.F.
        
        
             Let us consider how one would physically simulate this 
        
        problem.  You begin with two white balls in your hand - the ones 
        
        you have already selected.  Then you assume that each of the 
        
        other two balls is either white (W) or black (B).  To correspond 
        
        with these facts and assumption, one could then make up one bag 
        
        with WW, another with BB, a third with WB, and the fourth with 
        
        BW.  On any given trial one would a) select one of those bags at 
        
        random, b) combine the two white balls in hand with the balls in 
        
        the bag, and c) draw a ball.  
        
             The process is so simple that we can confidently forego 
        
        actual simulation and deduce that the probability of a white 
        
        would be .25 * 1 (from the WWWW bag), .25 * .5 (WWBB), and .5* 
        
        3/4 (from WWWB or WWBW), or 6/8.  This is a different answer than 
        
        Carroll obtained.  But this seems to be the answer that fits any 
        
        concrete realization of the facts of the situation. 
        
             If one now considers the second part of Carroll's question, 
        
        the answer is quite the same as for the first part, because the 
        
        actual facts - including one's state of knowledge - are the same 
        
        in both cases.
        
             In his study of the probabilistic Pillow Problems, Seneta 
        
        (1984) concurs that Carroll may not have arrived at the correct 






        
        answer, saying that "Dodgson [Carroll] may have had some 
        
        difficulty in handling conditional probabilities" (1984, p. 88).  
        
             The important question here is: Why did Carroll arrive at a 
        
        different answer than arrived at above?  I suggest that the 
        
        answer is that his purely-deductive calculations allowed him to 
        
        depart from the physical facts.  Over-abstraction often has this 
        
        pernicious property.  Simulation can often save one from falling 
        
        into such error.
        
                                   CONCLUSION
        
             Bayesian problems of updating estimates can be handled 
        
        easily and straightforwardly with simulation, whether the data 
        
        are discrete or continuous.  The process and the results tend to 
        
        be intuitive and transparent.  Simulation works best with the 
        
        original raw data rather than with abstractions from them via 
        
        percentages and distributions.  This can aid the understanding as 
        
        well as facilitate computation.  






        
                                   REFERENCES
        
             Box, George E. P., and George C. Tiao, Bayesian Inference in 
        
        Statistical Analysis (Reading, Mass:  Addison-Wesley, 1973)
        
             Carroll, Lewis, Pillow Problems (New York:  Dover, 
        
        1895/1958).
        
             Diaconis, Persi, and Sandy L. Zabell, "Updating Subjective 
        
        Probability." Journal of the American Statistical Association, 
        
        vol 77 (1982), pp. 822-830, reprinted in Decision Making, edited 
        
        by David E. Bell, Howard Raiffa, and Amos Tversky (Cambridge: 
        
        Cambridge University Press, 1988), pp. 266-83.  
        
             Feller, William, An Introduction to Probability Theory and 
        
        Its Applications (New York: Wiley, 3rd edition, 1968) 
        
             Fisher, R.A. Statistical Methods and Scientific Inference, 
        
        second edition, (London: Oliver and Boyd, 1959) 
        
             Gardner, Martin, The Second Scientific American Book of 
        
        Mathematical Puzzles & Diversions (New York: Simon and Schuster, 
        
        1961).
        
             Huff, Darrell, How to Take a Chance (New York:  W. W.  
        
        Norton, 1959).
        
             Jeffrey, Richard, The Logic of Decision (New York:  McGraw-
        
        Hill, 1965), referred to by Diaconis and Zabell.
        
             Jeffrey, Richard, "Probable Knowledge", in The Problem of 
        
        Inductive Logic, ed. I.  Lakatos, (Amsterdam:  New Holland, 1968) 
        
        pp. 166-180), referred to by Diaconis and Zabell.
        
             Kahneman, Daniel, and Amos  Tversky, "Subjective 
        
        probability:  A judgment of representativeness," abbreviated 
        
        version of a paper originally appearing in Cognitive Psychology, 
        






        1972, 3, 430-454, reprinted in Judgment under uncertainty:  
        
        Heuristics and biases, edited by Kahneman, Daniel, Paul Slovic, 
        
        and Amos Tversky (Cambridge: Cambridge University Press, 1982), 
        
        pp. 32-47.
        
             Nisbett, Richard E., David H. Krantz, Christopher Jepson, 
        
        and Geoffrey T. Fong, "Improving inductive inference," Judgment 
        
        under uncertainty:  Heuristics and biases, edited by Kahneman, 
        
        Daniel, Paul Slovic, and Amos Tversky (Cambridge: Cambridge 
        
        University Press, 1982), pp. 445-459.
        
             Piattelli-Palmarini, Massimo,  Inevitable Illusions (New 
        
        York:  Wiley, 1994).
        
             Schlaifer, Robert, Introduction to Statistics for Business 
        
        Decisions (New York:  McGraw-Hill, 1961). 
        
             Seneta, Eugene, "Lewis Carroll as a Probabilist and 
        
        Mathematician", Mathematical Scientist, vol 9, 1984, pp. 79-94. 
        
             Simon, Julian L., "What Does the Normal Curve `Mean'?"  
        
        Journal of Educational Research, Vol. 61, July-August, 1968, pp. 
        
        435-438.
        
             Simon, Julian L., "What Some Puzzling Problems Teach About 
        
        the Theory of Simulation and the Use of Resampling," The American 
        
        Statistician, November 1994, Vol. 48, No. 4, pp. 1-4.
        
             Tversky, Amos, and Daniel Kahneman, "Evidential impact of 
        
        base rates," in Judgment under uncertainty:  Heuristics and 
        
        biases, edited by Kahneman, Daniel, Paul Slovic, and Amos Tversky 
        
        (Cambridge: Cambridge University Press, 1982), pp. 153-162
        
             Whitworth, W. A., Choice and Chance (5th edition), (Cam-
        
        bridge:  Deighton Bell, 1901).  






        
                                    ENDNOTES
        
             1 Darrell Huff provides the quote but without reference:  
        
        "This branch of mathematics [probability] is the only one, I 
        
        believe, in which good writers frequently get results entirely 
        
        erroneous" (Huff, 1959, frontispage).   
        
             2We can use a similar procedure to illustrate an aspect of 
        
        the Bayesian procedure that Box and Tiao emphasize, its 
        
        sequentially-consistent character.  First let us carry out the 
        
        above procedure but only three black balls in a row being 
        
        observed.  The program to be used is the same except for the 
        
        insertion of "3" for "7" where "7" appears.  We then estimate the 
        
        probability for BB, which turns out to be about 1/5 instead of 
        
        about 1/65.  We then substitute for urn A an urn A' with 
        
        appropriate numbers of black Bb's and black BB's, to represent 
        
        the "updated" prior probability.  We may then continue by 
        
        substituting "4" for "3" above (to attain a total of seven 
        
        observed black balls), and find that the probability is about 
        
        what it was when we observed 7 black balls in a single sample 
        
        (1/65).   This shows that the Bayesian procedure accumulates 
        
        information without "leakage" and with consistency.  
        






        
        
                                    APPENDIX
        
        Program for Fisher's mice problem:
        
        [by Peter Bruce]
        
        NUMBERS (1 2 2) test    the urn with test mice, 1=BB and 2=Bb
        NUMBERS (3) bb          urn with "bb" mice
        
        At this point Peter Bruce - who wrote the program - resorts to a 
        trick of adding the results of the two different sampling 
        operations to identify particular types.  This enables him to 
        avoid some further programming with IF loops.  I worry about 
        confusing the reader with this trick, but I can afford to be pure 
        about it because he is doing the work and not I. 
        
        COPY 0 n                "n" will be the number of simulations
        WHILE n < 1000          repeat the following as long as n 
                                has not reached 1000
          SAMPLE 1 test test*     sample a test mouse
          SAMPLE 1 bb bb*         sample a brown mouse
          REPEAT 7             simulate 7 "matings"
           ADD test* bb* c     "mate" the mice; if "c" is a "4", it's a
                               BB-bb mating, which always yields black
                               offspring.  If "c" is a "5", it's a Bb-bb
                               mating which produces black half the
                               and brown brown the other half.  If the
                               latter is the case, we need a further
                               simulation to determine the color.  
                               We let 111 represent a black
                               outcome, 222 a brown outcome.
           IF c = 5            if we have a Bb-bb mating
            URN 1#111 1#222 d      offspring is 50/50 black/brown
            SAMPLE 1 d e       "e" will be either 111 or 222
           END
           IF c =4              if we have a BB-bb mating
            COPY 111 e         "e" must be 111
           END
           SCORE e y           Keep track of each of the 7 births
          END                  End the "mating loop"
          SUM y yy
          IF yy = 777        If all seven births were black
           SCORE test* z     Score the genetic character of test mouse
          END
          CLEAR y            Wipe out the birth scoreboard in preparation
                             for a new trial
          SIZE z n           Determine how many simulations have been run
                             so we can stop at 1000 (see top)
        END                  We can proceed past here once the WHILE
                             condition is satisfied
        COUNT z =1 k         How often was test mouse a BB?
        DIVIDE k n kk
        PRINT kk






        
        kk = .987.
        
        
             Program for Bertrand's problem (Spanish treasure fleet) 
        using the language RESAMPLING STATS:
        
        [by Peter Bruce]
        NUMBERS (7 7) gg       The 3 boxes, where "7"=gold, "8"=silver
        NUMBERS (7 8) gs
        NUMBERS (8 8) ss
        REPEAT 1000
          GENERATE 1 1,3 a     Select a box where gg=1, gs=2, ss=3
            IF a =1
              SCORE 1 z        If a=1, we're in the "gold-gold"
                               box.  That means we've picked a gold, and
                               we're guaranteed of getting
                               another gold (7) on our second pick.
                               So we score a "1" for success.
            END
            IF a=2             If a=2, we're in the gold-silver urn
              SAMPLE 1 gs b    Select a coin
              IF b =7          If b=7, we got a gold, so
                               score 0, (for no success) because
                               we can't get a 7 again.
                SCORE 0 z
              END              Note:  if b=8, we got a silver on our
                               first draw and we're not interested in
                               the second draw unless we get a gold first.
            END
                               Note: if a=3, we're not interested either.
                               We can't draw a gold on our first draw.
        END
        SIZE z k1              How many times did we get an initial gold?
        COUNT z =1 k2          Of those times, how often was our second
                               draw a gold?
        DIVIDE k2 k1 result    Calculate the latter as a proportion of
                               the former
        
        result =   0.64797