CHAPTER III-5
UPDATING SUBJECTIVE PROBABILITIES WITH SIMULATION:
FROM PEDAGOGY TO PRACTICE TO JEFFREY'S RULE TO PUZZLES
INTRODUCTION
The aim of this chapter is to show that simulation can be a
helpful and illuminating way to approach problems in Bayesian
analysis.
Simulation has two valuable properties for Bayesian
analysis: 1) It can provide an effective way to handle problems
whose analytic solution may be difficult or impossible. 2)
Simulation can provide insight to problems that otherwise are
difficult to understand fully, as is peculiarly the case with
Bayesian analysis. The chapter therefore presents examples
ranging from 1) the simplest pedagogy to 2) the complexities of
updating Bayesian probabilities in statistical practice to 3)
clarifying philosophical problems such as Jeffrey's Rule, and 4)
the unmasking of a non-problem by Lewis Carroll.
Philosopher Charles Sanders Peirce is paraphrased as saying
that "in no other branch of mathematics is it so easy for experts
to blunder as in probability theory" (Gardner, 1961, p. 220)1.
Even great mathematicians have blundered on simple problems,
including D'Alembert and Leibniz. This observation is especially
true of Bayesian problems, as much recent study in cognitive
psychology has shown (for a summary, see Piattelli-Palmarini,
1994). When psychologists employ probability puzzles showing how
people err, these puzzles almost invariably are problems in
conditional probability and Bayesian analysis (and Feller [1968,
p. 124] insists that Bayesian analysis be seen as an exercise in
Bayesian probability).
All but the simplest problems in conditional probability are
confusing to the intuition even if not difficult mathematically.
But when one tackles Bayesian and other problems in probability
with experimental simulation methods rather than with logic,
neither simple nor complex problems need be difficult for experts
or beginners.
THE SIMPLEST PEDAGOGICAL PROBLEMS
To make clear the nature of Bayes' rule, let us start with
the simplest sort of problem, and proceed gradually from there.
1. Assessing the Likelihood That a Used Car Will Be Sound.
Consider a problem in estimating the soundness of a used car
one considers purchasing (after Wonnacott and Wonnacott, 1990,
p. 93). Seventy percent of the cars are known to be OK on
average, and 30 percent are faulty. Of the cars that are really
OK, the mechanic identifies 80 percent as "OK" but says that 20
percent are "faulty"; of those that are faulty, the mechanic
correctly identifies 90 percent as faulty and says (incorrectly)
that 10 percent are OK.
We wish to know the probability that if the mechanic says a
car is "OK", it really is OK.
One can get the desired probabilities directly by simulation
without knowing Bayes' Law, as we shall see. But one must be
able to model the physical problem correctly in order to proceed
with the simulation; this requirement of a clearly-visualized
model is a strong point in favor of simulation.
The following steps determine the probability that a car
said to be "OK" will turn out to be really faulty:
1. Model in percentages the universe of all cars as an urn
of 100 balls. Working from the data as given above, and
referring to first principles, color (.9 * .3 =) .27 of the 100
balls (the 27 faulty balls said to be "faulty") violet, (.1 * .3
=) .03 of the 100 balls blue (3 balls said "OK" but faulty), .2
* .7 =) .14 of the balls (14 OK cars said to be "faulty" balls)
orange, and (.8 * .7 = ) 56 balls (said to be "OK" that really
are OK) maroon. A Venn diagram may help with this step, but it
is not necessary.
Even better procedure would be to work directly from the
original data. One would note, for example, that of 200 cars
previously observed, 54 were faulty and were said to be "faulty",
6 were faulty and were said to be "OK", 28 were OK but were said
to be "faulty", and 112 were OK and were said to be "OK". Then
make an urn of 54 violets, 6 blue, 28 orange, and 112 maroon.1
2. Draw a ball. If it is one of those said to be "faulty"
- that is, violet or orange - draw (with replacement) another
ball. Continue until obtaining a ball said to be "OK" - that is,
a blue or maroon ball. Then record its color.
--------------------
1. The use of percentages rather than raw numbers in
Bayesian problems is an unnecessary abstraction and is often
misleading, in addition to being a hindrance in modeling for
simulation. Indeed, thinking of the prior experience as a dis-
tribution of data rather than as a probability distribution is
both closer to the facts and less confusing in complex situa-
tions, as will be seen in a later example.
3. Repeat step 2 perhaps 1000 times and compute the
proportion of blues among the recorded results.
OR
1. Choose a number randomly between "1" and "100".
2. If "28-30" or "31-45:, record; otherwise draw another
number, and repeat until a number is recorded.
3. Repeat step 2 and count the proportion "28-30" among the
total "28-45".
The key modeling step is excluding a trial from
consideration (without making any record of it) if it produces an
irrelevant observation, and continuing to do so until the process
produces an observation of the sort about which you are presently
inquiring.
Using RESAMPLING STATS, an answer may be produced as
follows:
"01 - 27" = actually faulty, said to be "faulty"
"28 - 30 = faulty, "OK"
"31 - 86" = OK, "OK"
"87 - 100" = OK, "faulty"
REPEAT 1000 do 1000 repetitions
GENERATE 1 1,100 a generate a number between "1" and "100"
IF a between 28 86 if it's between "28" and "86" (those that
say "good")
SCORE a z score this number
END end the IF condition
END end REPEAT loop
COUNT z between 28 30 k how many of the SCORED numbers were
between "28 - 30" (faulty, "OK")
SIZE z s how many numbers were scored
DIVIDE k s kk what proportion were faulty, "OK"
PRINT kk print result
Result kk = 0.039
2. Estimating Driving Risk for Insurance Purposes
Another sort of introductory problem, following after Feller
(1968, p. 22):
A mutual insurance company charges its members
according to the risk of having an auto accident. It
is know that there two classes of people - 80 percent
of the population with good driving judgment and with a
probability of .06 of having an accident each year, and
20 percent with poor judgment and a probability of .6
of having an accident each year. The company's policy
is to charge (in addition to a fee to cover overhead
expenses) $100 for each percent of risk, i. e. a driver
with a probability of .6 should pay 60*$100 = $6000.
If nothing is known of a driver except that he had
an accident last year, what fee should he pay?
This procedure will produce the answer:
1. Construct urn A with 6 red and 94 green balls, and urn B
with 60 red and 40 green balls.
2. Randomly select an urn with probabilities for A = .8 and
B = .2, and record the urn chosen.
3. Select a ball at random from the chosen urn. If the
ball is green, repeat this step; if red, continue to step 4. In
either case, replace the ball selected.
4. Select another ball from the urn chosen in step 2. If
it is red, record "Y", if green, record "N".
5. Repeat steps 2 - 4 perhaps 1000 times, and determine the
proportion "Y". The final answer should be approximately 42*$100
= $4200.
3. Screening for Disease
This is a classic Bayesian problem quoted by Tversky and
Kahnemann, (1982, pp. 153-154, from Cascells, Schoenberger, and
Grayboys, 1978, p. 999):
If a test to detect a disease whose prevalence is
1/1000 has a false positive rate of 5%, what is the
chance that a person found to have a positive result
actually has the disease, assuming you know nothing
about the persons's symptoms or signs?
Tversky and Kahnemann note that among the respondents -
students and staff at Harvard Medical School, "The most common
response, given by almost half of the participants, was 95%" -
very much the wrong answer.
To obtain an answer by simulation, we may rephrase the
question above with (hypothetical) absolute numbers as follows:
If a test to detect a disease whose prevalence has been
estimated to be about 100,000 in the population of 100
million persons over age 40 (that is, about 1 in a
thousand) has been observed to have a false positive
rate of 60 in 1200, and never gives a negative result
if a person really has the disease, what is the chance
that a person found to have a positive result actually
has the disease, assuming you know nothing about the
persons's symptoms or signs?
If the raw numbers are not available, the problem can be
phrased in such terms as "about 1 case in 1000" and "about 5
false positives in 100 cases".)
One may obtain an answer as follows:
1. Construct urn A with 999 white beads and 1 black bead,
and urn B with 95 green beads and 5 red beads. A more complete
problem that also discusses false negatives would need a third
urn.
2. Pick a bead from urn A. If black, record "T", replace
the bead, and end the trial. If white, continue to step 3.
3. If a white bead is drawn from urn A, select a bead from
urn B. If red, record "F" and replace the bead, and if green
record "N" and replace the bead.
4. Repeat steps 2-4 perhaps 10,000 times, and in the
results count the proportion of "T"s to ("T"s plus "F"s) ignoring
the "N"s).
Of course 10,000 draws would be tedious, but even after a
few hundred draws a person would be likely to draw the correct
conclusion that the proportion of "T"s to ("T"s plus "F"s) would
be small. And it is easy with a computer to do 10,000 trials
very quickly.
Note that the respondents in the Cascells et al. study were
not naive; the medical staff members were supposed to understand
statistics. Yet most doctors and other personnel offered wrong
answers. If simulation can do better than the standard deductive
method, then simulation would seem to be the method of choice.
And only one piece of training for simulation is required: Teach
the habit of saying "I'll simulate it" and then actually doing
so.
FUNDAMENTAL PROBLEMS IN STATISTICAL PRACTICE
Box and Tiao begin their classic exposition of Bayesian
statistics with the analysis of a famous problem first published
by Fisher (1959).
...there are mice of two colors, black and brown. The
black mice are of two genetic kinds, homozygotes (BB)
and heterozygotes (Bb), and the brown mice are of one
kind (bb). It is known from established genetic theory
that the probabilities associated with offspring from
various matings are as [in Table 1]:
Suppose we have a "test" mouse which is black and has
been produced by a mating between two (Bb) mice. Using the
information in the last line of the table, it is seen that,
in this case, the prior probabilities of the test mouse
being homozygous (BB) and heterozygous (Bb) are precisely
known, and are 1/3 and 2/3 respectively. Given this prior
information, Fisher supposed that the test mouse was now
mated with a brown mouse and produced (by way of data) seven
black offspring. One can then calculate, as Fisher (1959,
p.17) did, the probabilities, posterior to the data, of the
test mouse being homozygous (BB) and heterozygous (Bb) using
Bayes' theorem...
We see that, given the genetic characteristics of the
offspring, the mating results of 7 black offspring changes
our knowledge considerably about the test mouse being (BB)
or (Bb), from a prior probability ratio of 2:1 in favor of
(Bb) to a posterior ratio of 64:1 against it (1973, pp. 12-
14) .
TABLE 1
PROBABILITIES FOR GENETIC CHARACTER OF MICE OFFSPRING
_______________________________________________________________________
Mice BB (black) Bb (black) bb (brown)
_______________________________________________________________________
BB mated with bb 0 1 0
Bb mated with bb 0 1/2 l/2
Bb mated with Bb 1/4 1/2 1/4
_______________________________________________________________________
Source: Box and Tiao, 1973, pp. 12-14
1. Let us begin, as do Box and Tiao, by restricting our
attention to the third line in Table 25-1, and let us represent
those results with 4 balls - 1 black with "BB" painted on it, 2
black with "Bb" painted on them, and 1 brown which we immediately
throw away because we are told that the "test mouse" is black.
The remaining 3 (black) balls are put into an urn labeled "test".
2. From prior knowledge we know that a BB black mouse mated
with a bb brown mouse will produce all black mice (line 1 in the
table), and a Bb black mouse mated with a bb brown mouse will
produce 50 percent black mice and 50 percent brown mice. We
therefore construct two more urns, one with a single black ball
(the urn labeled "BB") and the other with one black ball and one
brown ball (the urn labeled "Bb"). We now have three urns.
3. Take a ball from urn "test". If its label is "BB",
record that fact, take a ball (the only ball, which is black)
from the BB urn, record its color (we knew this already), and
replace the ball into the BB urn; the overall record of this
trial is "BB-black". If the ball drawn from urn "test" says
"Bb", draw a ball from the Bb urn, record, and replace; the
record will either be "Bb-black" or "Bb-brown".
4. Repeat step 3 seven times.
5. Examine whether the record of the seven balls drawn from
the BB and Bb urns are all black; if so, record "Y", otherwise
"N".
6. Repeat steps 3-5 perhaps 1000 times.
7. Ignore all "N" records. Proceeding now if the result of
step 5 is "Y": Count the number of cases which are BB and the
number which are Bb. The proportions of BB/"Y" and Bb/"Y" trials
are the probabilities that the test mouse is BB and Bb
respectively.
Creating the correct simulation procedure is not easy,
because Bayesian reasoning is very subtle - a reason it has been
the cause of much controversy for more than two centuries. But
it certainly is not easier to create a correct procedure using
analytic tools (except in the cookbook sense of plug in and
pray). And the difficult mathematics that underlie the analytic
method (see e. g., Box and Tiao, Appendix A1.1) make it almost
impossible for the statistician to fully understand the procedure
from beginning to end; if one is interested in insight, the
simulation procedure might well be preferred.2
A computer program to speed the above steps appears in the
Appendix. The result found with a set of 1000 repetitions
is .987.
PROBLEMS BASED ON NORMAL AND OTHER DISTRIBUTIONS1
Much of the work in Bayesian analysis for scientific
purposes treats the combining of prior distributions having
Normal and other standard shapes with sample evidence which may
also be represented with such standard functions. The
mathematics involved often is formidable, though some of the
calculational formulae are fairly simple and even intuitive.
These problems may be handled with simulation by replacing
the Normal (or other) distribution with the original raw data
--------------------
1. This section represents work done jointly with Ekaterina
Kamushadze and Peter C. Bruce.
when data are available, or by a set of discrete sub-universes
when distributions are subjective.
Measured data from a continuous distribution present a
special problem because the probability of any one observed value
is very low, often approaching zero, and hence the probability of
a given set of observed values usually cannot be estimated
sensibly; this is the reason for the conventional practice of
working with a continuous distribution itself, of course. But a
simulation necessarily works with discrete values. A feasible
procedure must bridge this gulf.
The logic for a problem of Schlaifer's will be only be
sketched out, to be described at length in another publication.
The procedure is rather novel, but it has not heretofore been
published and therefore must be considered tentative and
requiring particular scrutiny.
An Intermediate Problem in Conditional Probability
Schlaifer employs a quality-control problem for his leading
example of Bayesian estimation with Normal sampling. A chemical
manufacturer wants to estimate the amount of yield of a crucial
ingredient X in a batch of raw material in order to decide
whether it should receive special handling. The yield ranges
between 2 and 3 pounds (per gallon), and the manufacturer has
compiled the distribution of the last 100 batches.
The manufacturer currently uses the decision rule that if
the mean of nine samples from the batch (which vary only because
of measurement error, which is the reason that he takes nine
samples rather than just one) indicates that the batch mean is
greater than 2.5 gallons, the batch is accepted. The first
question Schlaifer asks, as a sampling-theory waystation to the
more general question, is the likelihood that a given batch with
any given yield - say 2.3 gallons - will produce a set of samples
with a mean as great or greater than 2.5 gallons.
We are told that the manufacturer has in hand nine samples
from a given batch; they are 1.84, 1.75, 1.39, 1.65, 3.53, 1.03,
2.73, 2.86, and 1.96, with a mean of 2.08. Because we are also
told that the manufacturer considers the extent of sample
variation to be the same at all yield levels, we may - if we are
again working with 2.3 as our example of a possible universe -
therefore add (2.3 - 2.08 =) 0.22 to each of these nine observa-
tions, so as to constitute a bootstrap-type universe; we do this
on the grounds that this is our best guess about the constitution
of that distribution with a mean at (say) 2.3.
We then repeatedly draw samples of nine observations from
this distribution (centered at 2.3) to see how frequently its
mean exceeds 2.5. This work is so straightforward that we need
not even state the steps in the procedure.
Estimating the Posterior Distribution
Next we estimate the posterior distribution. Figure 1 shows
the prior distribution of batch yields, based on 100 previous
batches.
Figure 1
Notation: Sm = set of batches (where total S = 100) with a
particular mean m (e. g. m = 2.1). xi = particular observation
(e. g. x3 = 1.03). s = the set of xi.
We now perform for each of the Sm (categorized into the
tenth-of-gallon divisions between 2.1 and 3.0 gallons), each
corresponding to one of the yields ranging from 2.1 to 3.0, the
same sort of sampling operation performed for Sm=2.3 in the
previous problem. But now, instead of having regard to the
manufacturer's decision criterion of 2.5, we construct an inter-
val of arbitrary width around the sample mean of 2.08 - say at .1
intervals from 2.03 to 2.13 - and then work with the weighted
proportions of sample means that fall into this interval.
1. Using a bootstrap-like approach, we presume that the
sub-universe of observations related to each Sm equals the mean
of that Sm - (e.g 2.1) plus (minus) the mean of the xi (equals
2.05) added to (subtracted from) each of the nine xi, e. g. 1.03
+ .05 = 1.08. For a distribution centered at 2.3, the values
would be (1.84+.22=1.96, 1.75+.22=1.87...).
2. Working with the distribution centered at 2.3 as an
example: Constitute a universe of the values (1.84+.22=1.96,
1.75+.22=1.87...). Here we may notice that the variability in
the sample enters into the analysis at this point, rather than
when the sample evidence is combined with the prior distribution;
this is in contrast to conventional Bayesian practice where the
posterior is the result of the prior and sample means weighted by
the reciprocals of the variances (see e.g. Box-Tiao, 1973, p. 17
and Appendix A1.1).
3. Draw nine observations from this universe (with
replacement, of course), compute the mean, and record.
4. Repeat step 2 perhaps 1000 times and plot the
distribution of outcomes.
5. Compute the percentages of the means within (say) .5 on
each side of the sample mean, i. e. from 2.03 - 2.13. The
resulting number - call it UPi - is the un-standardized (un-
normalized) effect of this sub-distribution in the posterior
distribution.
6. Repeat steps 1-5 to cover each other possible batch
yield from 2.0 to 3.0 (2.3 was just done).
7. Weight each of these sub-distributions - actually, its
UPi - by its prior probability, and call that WPi -.
8. Standardize the WPis to a total probability of 1.0. The
result is the posterior distribution. The value found is 2.283,
which the reader may wish to compare with a theoretically-
obtained result (which Schlaifer does not give).
This procedure must be biased because the numbers of "hits"
will differ between the two sides of the mean for all sub-
distributions except that one centered at the same point as the
sample, but it is as-yet unknown what are the extent and
properties of this bias. The bias would seem to be smaller as
the interval is smaller, but a small interval requires a large
number of simulations; a satisfactorily narrow interval surely
will contain relatively few trials, which is a practical problem
of still-unknown dimensions.
Another procedure - less theoretically justified and
probably more biased - intended to get around the problem of the
narrowness of the interval, is as follows:
5a. Compute the percentages of the means on each side of
the sample mean, and note the smaller of the two (or in another
possible process, the difference of the two). The resulting
number - call it UPi - is the un-standardized (un-normalized)
weight of this sub-distribution in the posterior distribution.
Another possible criterion - a variation on the procedure in
5a - is the difference between the two tails; for a universe with
the same mean as the sample, this difference would be zero.
The subject of this section has only been touched on for
lack of space, but more such problems, along with facilitating
computer programs, are available upon request.
SOLVING BAYESIAN PROBABILITY PUZZLES WITH SIMULATION
Several illustrative puzzles at whose heart is conditional
probability - including the famous Monty Hall "Let's Take a
Chance" three-door problem - appeared earlier in this journal
(Simon, 1994). Now let us consider a problem that Piattelli-
Palmarini (1994) considers a canonical "illusion" in probability,
and this time it will not only be dealt with by simulation, but
the psychological difficulty of solving the problem analytically
will be set forth. Here is Samuel Goldberg's version of the
problem that Joseph Bertrand posed early in the 19th century.
"Three identical boxes each contain two coins. In one
box both are pennies, in [the second both are nickels,
and in the third there is one penny and one nickel.
A man chooses a box at random and takes out a coin. If
the coin is a penny, what is the probability that the
other coin in the box is also a penny?"
Another way to phrase the same problem - with more dramatic
detail, which apparently makes the problems more difficult:
A Spanish treasure fleet of three ships was sunk at
sea off Mexico in the 1500s. One ship had a trunk of
gold forward and another aft, another ship had a trunk
of gold forward and a trunk of silver aft, while a
third ship had a trunk of silver forward and another
trunk of silver aft. A scuba diver just found one of
the ships and a trunk of gold in it, but she ran out of
air before she could check the other trunk. On deck,
they are now taking bets about whether the other trunk
found on the same ship will contain silver or gold.
What are fair odds that the trunk will contain gold?
These are the steps in a simulation that would answer the
question:
1. Create three urns each containing two balls labeled
"0,0", "0,1", and "1,1" respectively.
2. Choose an urn at random, and shuffle its contents.
3. Choose the first element in the chosen urn's vector. If
"1", stop trial and make no further record. If "0", continue.
4. Record the second element in the chosen urn's vector on
the scoreboard.
5. Repeat (2 - 5), and calculate the proportion "0's" on
scoreboard.
Though an analogous computer simulation is shown in the
Appendix, what makes this problem interesting is not the
comparison of computer simulation to the formulaic approach, but
rather the comparison of any simulation to ratiocination without
calculation. The reason why pure thought alone so often leads to
the wrong answer is that this deceptively-simple problem really
is quite complex, requiring many twists and turns.
These are the logical steps one may distinguish in arriving
at a correct answer with deductive logic (portrayed in Figure 2):
Figure 2
1. Postulate three ships - Ship I with two gold chests (G-
G), ship II with one gold and one silver chest (G-S), and ship
III with S-S. (Choosing notation might well be considered one or
more additional steps.)
2. Assert equal probabilities of each ship being found.
3. Step 2 implies equal probabilities of being found for
each of the six chests.
4. Fact: Diver finds a chest of gold.
5. Step 4 implies that S-S ship III was not found; hence
remove it from subsequent analysis.
6. Three possibilities: 6a) Diver found chest I-Ga, 6b)
diver found I-Gb, 6c) diver found II-Gc.
From step 2, the cases a, b, and c in step 6 have equal
probabilities.
7. If possibility 6a is the case, then the other trunk is
I-Gb; the comparable statements for cases 6b and 6c are I-Ga and
II-S.
8. From steps 6 and 7: From equal probabilities of the
three cases, and no other possible outcome, p (6a) = 1/3, p (6b)
= 1/3, p (6c) = 1/3,
9. So p(G) = p(6a) + p(6b) = 1/3 + 1/3 = 2/3.
A key implication of the deservedly-famous research on
errors in probabilistic judgments of Daniel Kahnemann and Amos
Tversky (interchangeably, Tversky and Kahnemann) is that human
thinking is often unsound. And some writers in their school of
thought assert that the unsoundness of thinking is hard-wired
into our brains; this point of view is expressed vividly in the
title of Massimo Piattelli-Palmarini's book Inevitable Illusions;
he calls the unsoundness "bias", and says that "we are
instinctively very poor evaluators of probability" (1994, p. 3,
italics in original).
Another possibility - not necessarily inconsistent with
genetic explanation - is that the reason we arrive at unsound
answers to certain types of problems is that the problems are
inherently very difficult, especially when they are tackled
without the assistance of tools, because the problems require
many steps and also because the steps often involve reversals in
the path. Without the aid of memory aids such as paper and
pencil, and the skill of using them well, the problems are just
too difficult for most persons.
One piece of evidence against the genetic-bias explanation
is that the wrong answers to problems are not all the same; they
do not even concentrate at one end of the probability spectrum.
As the work of Kahnemann and Tversky amply shows, the errors
often are widely distributed among most or all of the simple
arithmetical combinations of the numbers involved in the
problems. The outstanding characteristic of the answers is that
they are wrong, and not the nature of the errors. In following
long chains of logic and assessing complex assortments of
information, our brains may be weaker than we would like, but we
need not think of our brains as twisted.
The two explanations have quite different implications for
remediation, and two different remedies are offered; I suggest
resorting to simulation whereas others suggest additional
training (especially in probability theory) to improve people's
logic. The different remedies are not necessarily connected to
the two explanations, however; I believe that the remedy I
suggest is implied by the bias explanation as well as by the
weakness explanation.
SIMULATING PHILOSOPHICALLY-DIFFICULT BAYESIAN PROBLEMS
Another role for simulation in a Bayesian context is
penetrating problems that are difficult technically or
philosophically. This section presents two such examples.
1. Is Jeffrey's Rule of Any Use?
Jeffrey's Rule is a system for updating subjective
probabilities in light of additional information when the
probabilities have not been previously quantified. Box and Tiao
(1973, pp. 41-46) give a classic exposition, but perhaps the best
way to understand the system is from examples - an implicit
operational definition.
Diaconis and Zabell (in Bell et al., p. 271) give as an
example this problem from Whitworth (1901, pp. 167-68, Question
138):
A, B, C were entered for a race, and their
respective chances of winning were estimated at 2/11,
4/11, 5/11. But circumstances come to our knowledge in
favour of A, which raise his chance to 1/2; what are
now the chances in favour of B and C respectively?
Answer. A could lose in two ways, viz. either by
B winning or by C winning, and the respective chances
of his losing in these ways were a priori 4/11 and
5/11, and the chance of his losing at all was 9/11.
But after our accession of knowledge the chance of his
losing at all becomes 1/2, that is, it becomes dimin-
ished in the ratio of 18:11. Hence the chance of
either way in which he might lose is diminished in the
same ratio. Therefore the chance of B winning is now
4/11 x 11/18, or 4/18;
and of C winning
5/11 x 11/18, or 5/18.
These are therefore the required chances.
This problem persuades Diaconis and Zabell of the sometime-
incapacity of Bayes' rule. Yet a simulation solution to the
problem at hand seems straightforward:
1. Put 2 Amber, 4 Black, and 5 Claret balls in an urn, for
the original probabilities of A, B, and C respectively.
2. We wish to raise A's probability from 2/11 to 1/2 and
then find the new probabilities of B and C without changing the
relative probabilities of B and C; the necessity of making this
latter assumption (or some other; I simply follow Whitworth in
the restriction he places, rather than choosing one of my own) is
forced upon our attention when we consider changing the
composition of the marbles in the urn, and this forcing of
attention to a key issue is one of the greatest benefits of a
simulation approach. We therefore add 7 A's to the urn to make
A's probability 9/18. If this is not crystal-clear intuitively,
we can write formally (though understanding that the formalism is
unnecessary to the main line of the discussion):
A/T = 2/11
T = 11
A'/(T + x) = (2 + x)/(11 + x) = 1/2 where a prime on a
variable means ex post the change.
x = 7.
3. Estimate the new probabilities of B and C by repeated
trials [though of course one could also calculate B/(T + x) =
4/18 and C/(T + x) = 5/18 to get p(B') and p(C')].
The direct solution with simulation suggests that there is
no need for Jeffrey's or any other subtle analysis in this case.
One might reply that the simulation illustrates Jeffrey's
approach. But if the simulation method brings one directly to
the solution without need for Jeffrey's analysis, what is the
benefit of Jeffrey's analysis in this case?
The point here is not to deny that the discussion by
Diaconis and Zabell of the difficulties they see in the problem
throws light on interesting philosophical and theoretical issues.
And Jeffrey's Rule may be valuable in other contexts, though
apparently it is not necessary here; identifying those contexts
would be of interest. The point, rather, is to give one more
instance of how simulation can often (even if not always) give
simple and understandable solutions to apparently-difficult
problems.
Now consider the other numerical example presented in the
same article by Diaconis and Zabell (p. 272):
Suppose that in a criminal case we are trying to
decide which of four defendants, called a, b, c, d, is
a thief. We initially think P(a)=P(b)=P(c)=P(d)=1/4.
Evidence is then introduced to show that the thief was
probably left-handed. The evidence does not
demonstrate that the thief was definitely left-handed,
but it leads us to conclude the P(thief left-handed)
=.8. If a and b are the defendants who are left-
handed, then E1={a,b}, E2={c,d} and PH(E1)=.8,
PH(E2)=.2 [where H stands for the probability in light
of handedness]. If the only effect of the evidence was
to alter the probability of left-handedness -- in the
sense that P(A Ei)=PH((A Ei) -- then PH is obtained
from Jeffrey's rule as PH(a)=.4, PH(b)=.4, PH(c)=.1,
PH(d)=.1. Evidence is next presented that it is
somewhat likely that the thief was a woman. If the
female defendants are a and c, then F1={a,c}, F2={b,d}.
If PHS(F1)=.7 [where S stands for the probability in
light of sex] and Jeffrey-updating is again judged
acceptable, then
PEF(a)=.56, PHS(b)=.24,
PEF(c)=.14, PHS(d)=.06.
If instead the evidence (F1,.7), (F2,.3) is presented
first and (E1,.8), (E2,.2) is presented second, is PSH
equal to PHS? [This example] shows that in general the
order matters since the currently held opinion governs;
in this example the reader may check that the order
does not matter.
Now a simulation solution:
1. Put a total of 4 balls marked a, b, c, and d into an urn
for the original state of belief.
2. On the assumption - forced by our decision about which
balls to add to the urn (unless we explicitly choose to make some
other assumption) - that the relative probabilities of a:b and
c:d remain the same, add 3 a's and 3 b's to the urn, to make p(a
+ b) = .8.
3. Find the new probabilities of a, b, c and d by
experiment (or by examination of the proportions of balls in the
urn).
4. Continuing with the evidence relevant to the sex of the
thief: On the assumption of constant relative probabilities a:c
and b:d, and the same logic, add balls to make 14 a's, 6 b's, 56
c's, and 24 d's, which immediately produces the probabilities for
each suspect.
Again the simulation procedure suffices quite well without
auxiliary logical rules.
I have not yet found any reason to think that a similar
procedure would not operate successfully in other numerical (i.
e. realistic) cases.
These two examples suggest that simulation can provide both
an easy solution and considerable insight into the nature of at
least some problems hitherto addressed with Jeffrey's Rule.
Whether this is true of all such problems, or whether Jeffrey's
Rule handles problems that simulation cannot, or whether
Jeffrey's Rule provides insight over and beyond what simulation
provides in some or most cases, is at present unknown. An answer
would require canvassing and analyzing many such problems, in a
variety of contexts.
2. A Non-Problem of Lewis Carroll
This is Lewis Carroll's Pillow Problem 41 (1895/1958, pp. 9,
62, 63):
My friend brings me a bag containing four coun-
ters, each of which is either black or white. He bids
me draw two, both of which prove to be white. He then
says "I meant to tell you, before you began, that there
was at least one white counter in the bag. However,
you know it now, without my telling you. Draw again."
(1) What is now my chance of drawing white?
(2) What would it have been if he had not spoken?
Carroll gives the following solution:
(1) As there was certainly at least one W in the
bag at first, the 'a priori' chances for the various
states of the bag, 'WWWW, WWWB, WWBB, WBBB,' were '1/8,
3/8, 3/8, 1/8'.
These would have given, to the observed event, the
chances 'I, 1/2, 1/6, O'.
Hence the chances, after the event, for the var-
ious states, are proportional to '1/8.I, 3/8.1/2,
3/8.1/6'.; i.e. to '1/8, 3/16, 1/16'; i.e. to '2, 3,
I'. Hence their actual values are '1/3, 1/2, 1/6'.
Hence the chance, of now drawing W, is
'1/3.I+1/2.1/2'; i.e. it is 7/12.
Q.E.F.
(2) If he had not spoken, the 'a priori' chances
for the states 'WWWW, WWWB, WWBB, WBBB, BBBB', would
have been 'I, 4, 6, 4, I'.
16
These would have given, to the observed event, the
chances 'I, 1/2, 1/6, O, O'.
Hence the chances, after the event, for the var-
ious states, are proportional to '1/16.I, 1/4.1/2,
1/6.3/8'; i.e. to 'I, 2, I'. Hence their actual values
are '1/4, 1/2, 1/4'.
Hence the chance, of now drawing W, is
'1/4.I+1/2.1/2'; i.e. it is 1/2.
Q.E.F.
Let us consider how one would physically simulate this
problem. You begin with two white balls in your hand - the ones
you have already selected. Then you assume that each of the
other two balls is either white (W) or black (B). To correspond
with these facts and assumption, one could then make up one bag
with WW, another with BB, a third with WB, and the fourth with
BW. On any given trial one would a) select one of those bags at
random, b) combine the two white balls in hand with the balls in
the bag, and c) draw a ball.
The process is so simple that we can confidently forego
actual simulation and deduce that the probability of a white
would be .25 * 1 (from the WWWW bag), .25 * .5 (WWBB), and .5*
3/4 (from WWWB or WWBW), or 6/8. This is a different answer than
Carroll obtained. But this seems to be the answer that fits any
concrete realization of the facts of the situation.
If one now considers the second part of Carroll's question,
the answer is quite the same as for the first part, because the
actual facts - including one's state of knowledge - are the same
in both cases.
In his study of the probabilistic Pillow Problems, Seneta
(1984) concurs that Carroll may not have arrived at the correct
answer, saying that "Dodgson [Carroll] may have had some
difficulty in handling conditional probabilities" (1984, p. 88).
The important question here is: Why did Carroll arrive at a
different answer than arrived at above? I suggest that the
answer is that his purely-deductive calculations allowed him to
depart from the physical facts. Over-abstraction often has this
pernicious property. Simulation can often save one from falling
into such error.
CONCLUSION
Bayesian problems of updating estimates can be handled
easily and straightforwardly with simulation, whether the data
are discrete or continuous. The process and the results tend to
be intuitive and transparent. Simulation works best with the
original raw data rather than with abstractions from them via
percentages and distributions. This can aid the understanding as
well as facilitate computation.
REFERENCES
Box, George E. P., and George C. Tiao, Bayesian Inference in
Statistical Analysis (Reading, Mass: Addison-Wesley, 1973)
Carroll, Lewis, Pillow Problems (New York: Dover,
1895/1958).
Diaconis, Persi, and Sandy L. Zabell, "Updating Subjective
Probability." Journal of the American Statistical Association,
vol 77 (1982), pp. 822-830, reprinted in Decision Making, edited
by David E. Bell, Howard Raiffa, and Amos Tversky (Cambridge:
Cambridge University Press, 1988), pp. 266-83.
Feller, William, An Introduction to Probability Theory and
Its Applications (New York: Wiley, 3rd edition, 1968)
Fisher, R.A. Statistical Methods and Scientific Inference,
second edition, (London: Oliver and Boyd, 1959)
Gardner, Martin, The Second Scientific American Book of
Mathematical Puzzles & Diversions (New York: Simon and Schuster,
1961).
Huff, Darrell, How to Take a Chance (New York: W. W.
Norton, 1959).
Jeffrey, Richard, The Logic of Decision (New York: McGraw-
Hill, 1965), referred to by Diaconis and Zabell.
Jeffrey, Richard, "Probable Knowledge", in The Problem of
Inductive Logic, ed. I. Lakatos, (Amsterdam: New Holland, 1968)
pp. 166-180), referred to by Diaconis and Zabell.
Kahneman, Daniel, and Amos Tversky, "Subjective
probability: A judgment of representativeness," abbreviated
version of a paper originally appearing in Cognitive Psychology,
1972, 3, 430-454, reprinted in Judgment under uncertainty:
Heuristics and biases, edited by Kahneman, Daniel, Paul Slovic,
and Amos Tversky (Cambridge: Cambridge University Press, 1982),
pp. 32-47.
Nisbett, Richard E., David H. Krantz, Christopher Jepson,
and Geoffrey T. Fong, "Improving inductive inference," Judgment
under uncertainty: Heuristics and biases, edited by Kahneman,
Daniel, Paul Slovic, and Amos Tversky (Cambridge: Cambridge
University Press, 1982), pp. 445-459.
Piattelli-Palmarini, Massimo, Inevitable Illusions (New
York: Wiley, 1994).
Schlaifer, Robert, Introduction to Statistics for Business
Decisions (New York: McGraw-Hill, 1961).
Seneta, Eugene, "Lewis Carroll as a Probabilist and
Mathematician", Mathematical Scientist, vol 9, 1984, pp. 79-94.
Simon, Julian L., "What Does the Normal Curve `Mean'?"
Journal of Educational Research, Vol. 61, July-August, 1968, pp.
435-438.
Simon, Julian L., "What Some Puzzling Problems Teach About
the Theory of Simulation and the Use of Resampling," The American
Statistician, November 1994, Vol. 48, No. 4, pp. 1-4.
Tversky, Amos, and Daniel Kahneman, "Evidential impact of
base rates," in Judgment under uncertainty: Heuristics and
biases, edited by Kahneman, Daniel, Paul Slovic, and Amos Tversky
(Cambridge: Cambridge University Press, 1982), pp. 153-162
Whitworth, W. A., Choice and Chance (5th edition), (Cam-
bridge: Deighton Bell, 1901).
ENDNOTES
1 Darrell Huff provides the quote but without reference:
"This branch of mathematics [probability] is the only one, I
believe, in which good writers frequently get results entirely
erroneous" (Huff, 1959, frontispage).
2We can use a similar procedure to illustrate an aspect of
the Bayesian procedure that Box and Tiao emphasize, its
sequentially-consistent character. First let us carry out the
above procedure but only three black balls in a row being
observed. The program to be used is the same except for the
insertion of "3" for "7" where "7" appears. We then estimate the
probability for BB, which turns out to be about 1/5 instead of
about 1/65. We then substitute for urn A an urn A' with
appropriate numbers of black Bb's and black BB's, to represent
the "updated" prior probability. We may then continue by
substituting "4" for "3" above (to attain a total of seven
observed black balls), and find that the probability is about
what it was when we observed 7 black balls in a single sample
(1/65). This shows that the Bayesian procedure accumulates
information without "leakage" and with consistency.
APPENDIX
Program for Fisher's mice problem:
[by Peter Bruce]
NUMBERS (1 2 2) test the urn with test mice, 1=BB and 2=Bb
NUMBERS (3) bb urn with "bb" mice
At this point Peter Bruce - who wrote the program - resorts to a
trick of adding the results of the two different sampling
operations to identify particular types. This enables him to
avoid some further programming with IF loops. I worry about
confusing the reader with this trick, but I can afford to be pure
about it because he is doing the work and not I.
COPY 0 n "n" will be the number of simulations
WHILE n < 1000 repeat the following as long as n
has not reached 1000
SAMPLE 1 test test* sample a test mouse
SAMPLE 1 bb bb* sample a brown mouse
REPEAT 7 simulate 7 "matings"
ADD test* bb* c "mate" the mice; if "c" is a "4", it's a
BB-bb mating, which always yields black
offspring. If "c" is a "5", it's a Bb-bb
mating which produces black half the
and brown brown the other half. If the
latter is the case, we need a further
simulation to determine the color.
We let 111 represent a black
outcome, 222 a brown outcome.
IF c = 5 if we have a Bb-bb mating
URN 1#111 1#222 d offspring is 50/50 black/brown
SAMPLE 1 d e "e" will be either 111 or 222
END
IF c =4 if we have a BB-bb mating
COPY 111 e "e" must be 111
END
SCORE e y Keep track of each of the 7 births
END End the "mating loop"
SUM y yy
IF yy = 777 If all seven births were black
SCORE test* z Score the genetic character of test mouse
END
CLEAR y Wipe out the birth scoreboard in preparation
for a new trial
SIZE z n Determine how many simulations have been run
so we can stop at 1000 (see top)
END We can proceed past here once the WHILE
condition is satisfied
COUNT z =1 k How often was test mouse a BB?
DIVIDE k n kk
PRINT kk
kk = .987.
Program for Bertrand's problem (Spanish treasure fleet)
using the language RESAMPLING STATS:
[by Peter Bruce]
NUMBERS (7 7) gg The 3 boxes, where "7"=gold, "8"=silver
NUMBERS (7 8) gs
NUMBERS (8 8) ss
REPEAT 1000
GENERATE 1 1,3 a Select a box where gg=1, gs=2, ss=3
IF a =1
SCORE 1 z If a=1, we're in the "gold-gold"
box. That means we've picked a gold, and
we're guaranteed of getting
another gold (7) on our second pick.
So we score a "1" for success.
END
IF a=2 If a=2, we're in the gold-silver urn
SAMPLE 1 gs b Select a coin
IF b =7 If b=7, we got a gold, so
score 0, (for no success) because
we can't get a 7 again.
SCORE 0 z
END Note: if b=8, we got a silver on our
first draw and we're not interested in
the second draw unless we get a gold first.
END
Note: if a=3, we're not interested either.
We can't draw a gold on our first draw.
END
SIZE z k1 How many times did we get an initial gold?
COUNT z =1 k2 Of those times, how often was our second
draw a gold?
DIVIDE k2 k1 result Calculate the latter as a proportion of
the former
result = 0.64797