Approach 3: A Bayesian Approach
Approach 2 is in the Bayesian spirit in that it asks about
probabilities of the observed data conditional upon one or more
particular universes. This has the virtue of being quite
unambiguous in interpretation. But one can go even further in
this direction, as follows:
Mark off a set of universes on each side of the sample mean,
with centroids at equal distances from each other. Then perform
the same operations for each that are specified for the universes
mentioned in Approaches 1 and 2, and normalize the results. If
the prior distribution is assumed to be uniform, the results will
be the same as the standard confidence interval. But the
interpretation will be different. There will be no attempt to
make any statement about the unconditional probability of the
mean of the universe. Rather, the result will be a statement
about the mean of the universe conditional upon a uniform prior
distribution and the sample evidence, which is unchallengeable
logically.
If the prior distribution is not uniform, appropriate
adjustment can be made when normalizing, and again, the
interpretation is not subject to question, though the assumption
about the prior may be questioned.
The problem of computation has always been a barrier to this
sort of Bayesian interpretation. But if one simulates Bayesian
probabilities, the difficult disppears. Here follows an example
of such simulation. A confidence interval may immediately be
read from the posterior distribution in standard Bayesian
fashion. INSERT FROM STATSWRK3 BAYESNRM
Figure II-3--1 W and W fig 8-4 (from Clopper and Pearson)
From Encyclopedia of Statistics, "Confidence Interfals and
Regions", pp. 120-121.
...we can make probability statements about X; e.g.,
Pr[mu - 1.96sigma _< X _< mu + 1.96 sigma] = 0.95.
(1) We could rewrite this as
Pr[X - 1.96 sigma _< mu _< X + 1.96 sigma] = 0.95
(2) or
Pr[mu _E (X - 1.96 sigma, X + 1.96 sigma)] = 0.95. (3)
Although mu may appear to be the subject of statements (2)
and (3), the probability distribution referred to is that of
X, as was more obvious in statement (1). If X is observed to be x, we say that we have 95% confidence
that x - 1.96 sigma _< mu _< x + 1.96 sigma or say that (x
- 1.96 sigma,x + 1.96 sigma) is a 95% confidence interval
for mu. No probability statement is made about the
proposition
x - 1.96 sigma _< mu _< x + 1.96 sigma (4)
involving the observed value, x, since neither x nor mu has
a probability distribution. The proposition (4) will be
either true or false, but we do not know which. If
confidence intervals with confidence coefficient p were
computed on a large number of occasions, then, in the long
run, the fraction p of these confidence intervals would
contain the true parameter value. (This is provided that
the occasions are independent and that there is no selection
of cases.)
"Confidence Intervals and Regions," ------, pp. 120, 121.
page 1 \statphil Chapter II-3 statconf 4-23-9623
CHAPTER II-3
POINT ESTIMATION AND CONFIDENCE INTERVALS I: THE LOGIC<1>
This chapter discusses how to assess the accuracy of a point
estimate of the mean, median, or other statistic of a sample. We
want to know: How close is our estimate of (say) the sample mean
likely to be to the population mean? It is all very well to say
that on average the sample mean (or other point estimator) equals
a population parameter. But what about the result of any
particular sample? How accurate or inaccurate an estimate is it
likely to produce? Early in the history of statistical inference
this question arose in the practice of astronomy (see Stigler,
1986; Hald, 1990).
The accuracy of an estimate is a hard intellectual nut to
crack, so hard that for hundreds of years statisticians and
scientists wrestled with the problem with little success; it was
not until the last century or two that much progress was made.
The kernel of the problem is learning the extent of the variation
in the population. But whereas the sample mean can be used
straightforwardly to estimate the population mean, the extent of
variation in the sample does not directly estimate the extent of
the variation in the population, because the variation differs at
different places in the distribution, and there is no reason to
expect it to be symmetrical around the estimate or the mean.
The intellectual difficulty of confidence intervals may be
one reason why they are less prominent in statistics literature
and practice than are tests of hypotheses (though statisticians
often favor confidence intervals). Another reason is that tests
of hypotheses are more fundamental for pure science because they
address the question that is at the heart of all knowledge-get-
ting: "Should these groups be considered different or the same?"
The statistical inference represented by confidence limits ad-
dresses what seems to be a secondary question in most sciences
(though not in astronomy or perhaps physics): "How reliable is
the estimate?" Still, confidence intervals are very important in
some applied sciences such as geology - estimating the variation
in grades of ores, for example - and in some parts of business
and industry.
Confidence intervals and hypothesis tests are not disjoint
ideas. Indeed, hypothesis testing of a single sample against a
benchmark value is (in all schools of thought, I believe) opera-
tionally identical with the most common way (Approach 1 below) of
constructing a confidence interval and checking whether it in-
cludes that benchmark value. But the underlying reasoning is
different for confidence limits and hypothesis tests.
The logic of confidence intervals is on shakier ground, in
my judgment, than that of hypothesis testing, though there are
many thoughtful and respected statisticians who argue that the
logic of confidence intervals is better grounded and leads less
often to error.
Confidence intervals are considered by many to be part of
the same topic as estimation - being an estimation of accuracy,
in their view. And confidence intervals and hypothesis testing
are seen as sub-cases of each other by some people. Whatever the
importance of these distinctions among these intellectual tasks
in other contexts, they need not concern us here.
Confidence intervals - even if they are meaningful -
certainly are controversial. The Encyclopedia of Statistics
says: "Confidence intervals are widely used in practice,
although not as widely supported by people interested in the
foundatrions of statistics" (Vol 2, p. 126). Some statisticians
will not even discuss the topic. For example, the index to the
well-respected book on Basic Concepts of Probability and
Statistics by Hodges and Lehman (1970) does not even have a
listing for confidence intervals. And Savage in his The
Foundations of Statistics (1954/1972) first says that "The
doctrine of accuracy estimation is vague" (p. 257) and then later
writes the same phrase except with "erroneous" instead of "vague"
(p. 260). He goes on to say that "not being convinced myself, I
am in no position to present convincing evidence for the
usefulness of interval estimation" (p. 261). He describes
Fisher's approach to the matter, fiducial probability, as the
"most disputed technical concept of modern statistics" (p. 262),
and says no more. He also refers to the supposedly-related idea
of tolerance intervals as "slippery", so it is not surprising if
the layperson finds the entire matter slippery. [1]
One thing is undeniable, however: Despite the difficulty
subtlety of the topic, the accuracy of estimates must be dealt
with, one way or another.
Philosophers seldom write about the subject, with the
notable exception of Braithwaite (1953), who bases his treatment
on Neyman and Pearson; his treatment is adventurous, yet
sufficiently obscure to tax anyone's understanding.
Because the logic of confidence intervals is subtle, most
statistics texts skim right past the conceptual difficulties, and
go directly to computation. And when the concept is combined
with the conventional algebraic treatment, the composite is truly
baffling; the formal mathematics makes impossible any intuitive
understanding. For students, "pluginski" is the only viable
option.
With the resampling method, however, the mathematics of
confidence intervals is easy. The statistical interpretation of
the calculations then becomes a challenging and even pleasurable
subject; even beginning undergraduates can enjoy the subtlety and
find that it feels good to stretch the brain and get down to
fundamentals, once the calculations become transparent.
To preview the treatment of confidence intervals presented
below, which I hope dissolves the confusion of the topic: We do
not learn about the reliability of sample estimates of the mean
(and other parameters) by logical inference from any one
particular sample to any one particular universe, because this
cannot be done in principle. Instead, we investigate the
behavior of various universes in the neighborhood of the sample,
universes whose characteristics chosen on the basis of their
resemblances to the sample. In this way the estimation of
confidence intervals is like all other statistical inference:
One investigates the probabilistic behavior of one or more
hypothesized universes, the hypotheses being implicitly suggested
by the sample evidence but not logically implied by that
evidence.
The examples worked through below show why statistics is as
difficult a subject as it is. The procedure required to transit
successfully from the original question to the statistical
probability and then interpretation of the probability involves a
great many choices about the appropriate model based on analysis
of the problem at hand; a wrong choice at any point dooms the
procedure. The actual computation of the probability - whether
done with formulaic probability theory or with resampling - is
only a very small part of the procedure, and it is the least
difficult part if one proceeds with resampling. The difficulties
in the statistical process are not mathematical but rather stem
from the hard clear thinking needed to understand the nature of
the situation and to ascertain the appropriate way to model it.
In comparison with the logic of hypothesis testing, the
logic of confidence limits is more subtle (though it need not be
as opaque as one would think it is from reading the philosophic
and statistical literature (e.g. Braithwaite, 1953; Gigerenzer
et. al., 1989). The difference, I think, is that in hypothesis-
testing situations we find it relatively easy to decide which
universes we wish to analyse, and therefore the deductive chain
from the statistical question to the probabilistic study is
short, clear, and strong. But when inquiring into the accuracy
of estimations we find it much harder to decide which universes
we wish to analyse. This is the core of the problem, and climbing
up the deductive chain is therefore fraught with difficulty.[1]
THE LOGIC OF CONFIDENCE INTERVALS
The purpose of a confidence interval is to help us assess
the reliability of one or more parameters of the sample - most
often its mean or median - as an estimator of the parameter of
the universe.
If one draws a sample that is very very large - large enough
so that one need not worry about sample size and dispersion in
the case at hand, from a universe whose characteristics one
knows, one then can deduce the probability that the sample mean
will fall within a given distance of the population mean. Intui-
tively, it seems as if one should also be able to reverse the
process - to infer something about the location of the population
mean from the sample mean. But this inverse inference turns out
to be a slippery business indeed. Let's put it differently: It is all very well to say - as
one logically may - that on average the sample mean (or other
point estimator) equals a population parameter in most situa-
tions. But what about the result of any particular sample? How
accurate or inaccurate an estimate of the population mean is the
sample likely to produce?
The line of thought runs as follows: It is possible to map
the distribution of the means (or other such parameter) of
samples of any given size (the sample size of interest in any
investigation usually being the size of the observed sample) and
of any given pattern of dispersion (which we will assume for now
can be estimated from the sample) that a universe in the
neighborhood of the sample will produce. For example, we can
compute how big an interval to the right of a postulated
universe's mean will include 45 percent of the samples on one
side of the mean and 45 percent on the other side.
What cannot be done is to draw conclusions from sample
evidence about the nature of the universe from which it was
drawn, in the absence of some information about the set of uni-
verses from which it might have been drawn. That is, one can
investigate the behavior of one or more specified universes, and
discover the absolute and relative likelihoods that the given
specified universe(s) might produce such a sample. But the
universe(s) to be so investigated must be specified in advance
(which is consistent with the Bayesian view of statistics). To
put it differently, we can employ probability theory to learn the
pattern(s) of results produced by samples drawn from a particular
specified universe, and then compare that pattern to the observed
sample. But we cannot infer the probability that that sample was
drawn from any given universe in the absence of knowledge of the
other possible sources of the sample. That is a subtle differ-
ence, but hopefully the following discussion makes it
understandable.
COMPUTING CONFIDENCE INTERVALS
In the first part of the discussion we shall leave aside the
issue of estimating the extent of the dispersion - a troublesome
matter, but one which seldom will result in unsound conclusions
even if handled crudely.
To start from scratch again: The first - and seemingly
straightforward - step is to estimate the mean of the population
based on the sample data. The next and more complex step is to
ask about the range of values (and their probabilities) that the
estimate of the mean might take - that is, the construction of
confidence intervals. It seems natural to assume that if our
best guess about the population mean is the value of the sample
mean, our best guesses about the various values that the
population mean might take if unbiased sampling error causes
discrepancies between population parameters and sample
statistics, should be values clustering around the sample mean in
a symmetrical fashion (assuming that asymmetry is not forced by
the distribution - as for example, the binomial is close to
symmetric near its middle values). But how far away from the
sample mean might the population mean be?
Let's walk slowly through the logic, going back to basics to
enhance intuition. Let's start with the familiar saying, "The
apple doesn't fall far from the tree." Imagine that you are in a
very hypothetical place where an apple tree is above you, and you
are not allowed to look up at the tree, whose trunk has an
infinitely thin diameter. You see an apple on the ground. You
must now guess where the trunk (center) of the tree is. The
obvious guess for the location of the trunk is right above the
apple. But the trunk is not likely to be exactly above the apple
because of the small probability of the trunk being at any
particular location, due to sampling dispersion.
Though you find it easy to make a best guess about where the
mean is (the true trunk), with the given information alone you
have no way of making an estimate of the probability that the
mean is one place or another, other than that the probability is
the same that the tree is to the north or south, east or west, of
you. You have no idea about how far the center of the tree is
from you. You cannot even put a maximum on the distance it is
from you, and without a maximum you could not even reasonably
assume a rectangular distribution, or a Normal distribution, or
any other.
Next you see two apples. What guesses do you make now? The
midpoint between the two obviously is your best guess about the
location of the center of the tree. But still there is no way to
estimate the probability distribution of the location of the
center of the tree.
Now assume you are given still another piece of
information: The outermost spread of the tree's branches (the
range) equals the distance between the two apples you see. With
this information, you could immediately locate the boundaries of
the location of the center of the tree. But this is only because
the answer you sought was given to you in disguised form.
You could, however, come up with some statements of relative
probabilities. In the absence of prior information on where the
tree might be, you would offer higher odds that the center (the
trunk) is in any unit of area close to the center of your two
apples than in a unit of area far from the center. That is, if
you are told that either one apple, or two apples, came from one
of two specified trees whose locations are given, with no reason
to believe it is one tree or the other (later, we can put other
prior probabilities on the two trees), and you are also told the
dispersions, you now can put relative probabilities on one tree
or the other being the source. (This is like the Neyman-Pearson
procedure, and it is easily reconciled with the Bayesian point of
view to be explored later. One can also connect this concept of
relative probability to the Fisherian concept of maximum likeli-
hood - which is a likelihood relative to all others). And you
could list from high to low the probabilities for each unit of
area in the neighborhood of your apple sample. But this proce-
dure is quite different from making any single absolute numerical
probability estimate of the location of the mean.
Now let's say you see 10 apples on the ground. Of course
your best estimate is that the trunk of the tree is at their
arithmetic center. But how close to the actual tree trunk (the
population mean) is your estimate likely to be? This is the
question involved in confidence intervals. We want to estimate a
range (around the center, which we estimate with the center mean
of the sample, we said) within which we are pretty sure that the
trunk lies.
To simplify, we consider variation along only one dimension
- that is, on (say) a north-side line rather than on two (the
entire surface).
We first note that you have no reason to estimate the
trunk's location to be outside the sample pattern, or at its
edge, though it could be so in principle.
If the pattern of the 10 apples is tight, you imagine the
pattern of the likely locations of the population mean to be
tight; if not, not. That is, it is intuitively clear that there
is some connection between how spread out are the sample
observations and your confidence about the location of the
population mean. For example, consider two patterns of a
thousand apples, one with twice the spread of another, where we
measure spread by (say) the diameter of the circle that holds the
inner half of the apples for each tree, or by the standard
deviation. It makes sense that if the two patterns have the same
center point (mean), you would put higher odds on the tree with
the smaller spread being within some given distance - say, a foot
- of the estimated mean. But what odds would you give on that
bet?
THE TWO APPROACHES TO ESTIMATING CONFIDENCE INTERVALS
There are two broad conceptual approaches to the question at
hand: 1) Study the probability of various distances between the
sample mean and the likeliest population mean; and 2) study the
behavior of particular border universes. Computationally, both
approaches often yield the same result, but their interpretations
differ. Approach 1 follows the conventional logic although
carrying out the calculations with resampling simulation.
Approach 1: The Conventional Logic for a Confidence Interval:
The Distance Between Sample and Population Mean
If the study of probability can tell us the likelihood that
a given population will produce a sample with a mean at a given
distance x from the population mean, and if a sample is an
unbiased estimator of the population, then it seems natural to
turn the matter around and interpret the same sort of data as
telling us the probability that the estimate of the population
mean is that far from the "actual" population mean. A fly in the
ointment is our lack of knowledge of the dispersion, but we can
safely put that aside for now. (See below, however).
This first approach begins by assuming that the universe
that actually produced the sample has the same amount of
dispersion (but not necessarily the same mean) that one would
estimate from the sample. One then produces (either with
resampling or with Normal distribution theory) the distribution
of sample means that would occur with repeated sampling from that
designated universe with samples the size of the observed sample.
One can then compute the distance between the (assumed)
population mean and (say) the inner 45 percent of sample means on
each side of the actually-observed sample mean.
The crucial step is to shift vantage points. We look from
the sample to the universe, instead of from a hypothesized
universe to simulated samples (as we have down so far). This same
interval as computed above must be the relevant distance as when
one looks from the sample to the universe. Putting this
algebraically, we can state (on the basis of either simulation or
formal calculation) that for any given population S, and for any
given distance d from its mean mu, that
p[(mu - xbar) < d] = alpha,
where xbar is a randomly-generated sample mean and alpha is the
probability resulting from the simulation or calculation.
The above equation focuses on the deviation of various
sample means (xbar) from a stated population mean (mu). But we
are logically entitled to read the algebra in another fashion,
focusing on the deviation of mu from a randomly-generated sample
mean. This implies that for any given randomly-generated sample
mean we observe, the same probability (alpha) describes the
probability that mu will be at a distance d or less from the
observed xbar. (I believe that this is the logic underlying the
conventional view of confidence intervals, but I have yet to find
a clear-cut statement of it; in any case, it appears to be
logically correct.)
To repeat this difficult idea in slightly different words:
If one draws a sample (large enough to not worry about sample
size and dispersion), one can say in advance that there is a
probability p that the sample mean (xbar) will fall within z
standard deviations of the population mean (mu). One estimates
the population dispersion from the sample. If there is a
probability p that xbar is within z standard deviations of mu,
then with probability p, mu must then be within that same z
standard deviations of xbar. To repeat, this is, I believe, the
heart of the standard concept of the confidence interval, to the
extent that there is thought-through consensus on the matter.
So we can state for such populations the probability that
the distance between the population and sample means will be d or
less. Or with respect to a given distance, we can say that the
probability that the population and sample means will be that
close together is p.
That is, we start by focusing on how much the sample mean
diverges from the known population mean. But then - and to
repeat once more this key conceptual step - we re-focus our
attention to begin with the sample mean and then discuss the
probability that the population mean will be within a given
distance. The resulting distance is what we call the "confidence
interval".
Please notice that the distribution (universe) assumed at
the beginning of this approach did not include the assumption
that the distribution is centered on the sample mean or anywhere
else. It is true that the sample mean is used for purposes of
reporting the location of the estimated universe mean. But
despite how the subject is treated in the conventional approach,
the estimated population mean is not part of the work of
constructing confidence intervals. Rather, the calculations
apply in the same way to all universes in the neighborhood of the
sample (which are assumed, for the purpose of the work, to have
the same dispersion). And indeed, it must be so, because the
probability that the universe from which the sample was drawn is
centered exactly at the sample mean is very small.
This independence of the confidence-intervals construction
from the mean of the sample (and the mean of the estimated
universe) is surprising at first, but after a bit of thought it
makes sense.
In this first approach, as noted more generally above, we do
not make estimates of the confidence intervals on the basis of
any logical inference from any one particular sample to any one
particular universe, because this cannot be done in principle; it
is the futile search for this connection that for decades roiled
the brains of so many statisticians and now continues to trouble
the minds of so many students. Instead, we investigate the
behavior of (in this first approach) the universe that has a
higher probability of producing the observed sample than does any
other universe (in the absence of any additional evidence to the
contrary), and whose characteristics are chosen on the basis of
its resemblance to the sample. In this way the estimation of
confidence intervals is like all other statistical inference:
One investigates the probabilistic behavior of one or more
hypothesized universes, the universe(s) being implicitly
suggested by the sample evidence but not logically implied by
that evidence. And there are no grounds for dispute about
exactly what is being done - only about how to interpret the
results.
One difficulty with the above approach is that the estimate
of the population dispersion does not rest on sound foundations;
this matter will be discussed later, but it is not likely to lead
to a seriously misleading conclusion.
A second difficulty with this approach is in interpreting
the result. What is the justification for focusing our attention
on a universe centered on the sample mean? While this particular
universe may be more likely than any other, it undoubtedly has a
low probability. And indeed, the statement of the confidence
intervals refers to the probabilities that the sample has come
from universes other than the universe centered at the sample
mean, and quite a distance from it.
My answer to this question does not rest on a set of
meaningful mathematical axioms, and I assert that a meaningful
axiomatic answer is impossible in principle. Rather, I reason
that we should consider the behavior of this universe because
other universes near it will produce much the same results,
differing only in dispersion from this one, and this difference
is not likely to be crucial; this last assumption is all-
important, of course. True, we do not know what the dispersion
might be for the "true" universe. But elsewhere (Chapter 00 in
[Statphil]) I argue that the concept of the "true universe" is
not helpful - or maybe even worse than nothing - and should be
forsworn. And we can postulate a dispersion for any other
universe we choose to investigate. That is, for this postulation
we unabashedly bring in any other knowledge we may have. The
defense for such an almost-arbitrary move would be that this is a
second-order matter relative to the location of the estimated
universe mean, and therefore it is not likely to lead to serious
error. (This sort of approximative guessing sticks in the
throats of many trained mathematicians, of course, who want to
feel an unbroken logic leading backwards into the mists of axiom
formation. But the axioms themselves inevitably are chosen
arbitrarily just as there is arbitrariness in the practice at
hand, though the choice process for axioms is less obvious and
more hallowed by having been done by the masterminds of the past.
(See Chapter 00 in [Statphil] on the necessity for judgment.)
The absence of a sequence of equations leading from some first
principles to the procedure described in the paragraph above is
evidence of what is felt to be missing by those who crave logical
justification. The key equation in this approach is formally
unassailable, but it seems to come from nowhere.)
In the examples in the following chapter may be found
computations for two population distributions - one binomial and
one quantitative - of the histograms of the sample means produced
with this procedure.
Operationally, we use the observed sample mean, together
with an estimate of the dispersion from the sample, to estimate a
mean and dispersion for the population. Then with reference to
the sample mean we state a combination of a distance (on each
side) and a probability pertaining to the population mean. The
computational examples will illustrate this procedure.
Once we have obtained a numerical answer, we must decide how
to interpret it. There is a natural and almost irresistible
tendency to talk about the probability that the mean of the
universe lies within the intervals, but this has proven confusing
and controversial. Interpretation in terms of a repeated process
is not very satisfying intuitively [1]. In my view, it is not
worth arguing about any "true" interpretation of these
computations. One could sensibly interpret the computations in
terms of the odds a decision-maker, given the evidence, would
reasonably offer about the relative likelihoods that the sample
came from one of two specified universes (one of them probably
being centered on the sample); this does provide some information
on reliability, but this procedure departs from the concept of
confidence intervals.
The reader may find it useful to read in the next chapter
examples of the actual practice of computing confidence intervals
in Approach 1, before proceeding to read about Approach 2.
Approach 2. A Relevant Method Though Not a Confidence Interval:
Likelihood of Various Universes Producing This Sample
There is another simple method for getting an impression of
the location of the sample with respect to the universe that
generated it; it is not the same as a confidence interval[1], but
it can be illuminating. We can simply pick any particular
location and state the probability that a given universe located
at that point would produce a sample with a mean as far or
farther away than the observed sample. This method does not
require any assumptions about the locations of universes. But it
clearly does not allow one to state a probability that the sample
came from any particular universe or set of universes within any
particular interval.
The second approach to the general question of estimate
accuracy is to analyze the behavior of a variety of universes
centered at other points on the line, rather than the universe
centered on the sample mean. One can ask the probability that a
distribution centered away from the sample mean, with a given
dispersion, would produce (say) a 10-apple scatter having a mean
as far away from the given point as the observed sample mean. If
we assume the situation to be symmetric 1[2], we can find a point
at which we can say that a distribution centered there would have
only a (say) 5 percent chance of producing the observed sample.
And we can also say that a distribution even further away from
the sample mean would have an even lower probability of producing
the given sample. But we cannot turn the matter around and say
that there is any particular chance that the distribution that
actually produced the observed sample is between that point and
the center of the sample.
Imagine a situation where you are standing on one side of a
canyon, and you are hit by a baseball, the only ball in the
vicinity that day. Based on experiments, you can estimate that a
baseball thrower who you see standing on the other side of the
canyon has only a 5 percent chance of hitting you with a single
throw [1]. But this does not imply that the source of the ball
that hit you was someone else standing in the middle of the
canyon, because that is patently impossible. That is, your
knowledge about the behavior of the "boundary" universe does not
logically imply anything about the existence and behavior of any
other universes. But just as in the discussion of testing
hypotheses, if you know that one possibility is unlikely, it is
reasonable that as a result you will draw conclusions about other
possibilities in the context of your general knowledge and
judgment.
We can find the "boundary" distribution(s) we seek if we a)
specify a measure of dispersion, and b) try every point along the
line leading away from the sample mean, until we find that
distribution that produces samples such as that observed with a
(say) 5 percent probability or less.
To estimate the dispersion, in many cases we can safely use
an estimate based on the sample dispersion, using either
resampling or Normal distribution theory. The hardest cases for
resampling are a) a proportion near 0.1 or 1.0, and b) a very
small sample of data. In such situations one should use
additional outside information, or Normal distribution theory, or
both.
We can also create a confidence interval in the following
fashion: We can first estimate the dispersion for a universe in
the general neighborhood of the sample mean, using various
devices to be "conservative", if we like.[1] Given the estimated
dispersion, we then estimate the probability distribution of
various amounts of error between observed sample means and the
population mean. We can do this with resampling simulation as
follows: a) Create other universes at various distances from the
sample mean, but with other characteristics similar to the
universe that we postulate for the immediate neighborhood of the
sample, and b) experiment with those universes. One can also
apply the same logic with a more conventional parametric
approach, using general knowledge of the sampling distribution of
the mean, based on Normal distribution theory or previous
experience with resampling. We shall not discuss the latter
method here.
As with approach 1, we do not make any probability
statements about where the population mean may be found. Rather,
we discuss only what various hypothetical universes might
produce, and make inferences about the "actual" population's
characteristics by comparison with those hypothesized universes.
If we are interested in (say) a 95 percent confidence
interval, we want to find the distribution on each side of the
sample mean that would produce a sample with a mean that far away
only 2.5 percent of the time (2 * .025 = 1 - .95). A shortcut to
find these "border distributions" is to plot the sampling
distribution of the mean at the center of the sample, as in
Approach 1. Then find the (say) 2.5 percent cut-offs at each end
of that distribution. On the assumption of equal dispersion at
the two points along the line, we now reproduce the previously-
plotted distribution with its centroid (mean) at those 2.5
percent points on the line. The new distributions will have 2.5
percent of their areas on the other side of the mean of the
sample.
So from the standpoint of Approach 2, the conventional
sample formula (e. g. Wonnacott and Wonnacott, 1990, p. 5) which
is centered at the mean can be considered a shortcut to
estimating the boundary distributions. We say that the boundary
is at the point that centers a distribution which has only a
(say) 2.5 percent chance of producing the observed sample; it is
that distribution which is the subject of the discussion - that
is, one of the distributions at the endpoints of the vertical
line in Figure II-3-1 - and not the distribution which is
centered at mu = xbar. [1]
Figure II-3--1
To restate, then: moving progressively farther away from
the sample mean, we can eventually find a universe that has only
some (any) specified small probability of producing a sample like
the one observed. One can then say that this point represents a
"limit" or "boundary" so that the interval between it and the
sample mean may be called a confidence interval.
Interpretation of Approach 2
Now to interpret the results of the second approach:
Assuming that the sample is not drawn in a biased fashion (such
as the wind blowing all the apples in the same direction), and
assuming that the population has the same dispersion as the
sample, we can say that distributions centered at the 95 percent
confidence points (each of them including a tail with 2.5 percent
of the area), or even further away from the sample mean, will
produce the observed sample only 5 percent of the time or less.
The result of the second approach is more in the spirit of a
hypothesis test than of the usual interpretation of confidence
intervals. Another statement of the result of the second
approach is: We postulate a given universe - say, a universe at
(say) the two-tailed 95 percent boundary line. We then say: The
probability that the observed sample would be produced by a
universe with a mean as far (or further) from the observed
sample's mean as the universe under investigation is only 2.5
percent. This is similar to the prob-value interpretation of a
hypothesis-test framework. It is not a direct statement about
the location of the mean of the universe from which the sample
has been drawn. But it is certainly reasonable to derive a
betting-odds interpretation of the statement just above, to wit:
the chances are 2 1/2 in 100 (or, the odds are 2 1/2 to 97 1/2)
that a population located here would generate a sample with a
mean as far away as the observed sample. And it would seem
legitimate to proceed to the further betting-odds statement that
(assuming we have no additional information) the odds are 97 1/2
to 2 1/2 that the mean of the universe that generated this sample
is no farther away from the sample mean than the mean of the
boundary universe under discussion. About this statement there
is nothing slippery, and its meaning should not be controversial.
Here again the tactic for interpreting the statistical
procedure is to restate the facts of the behavior of the universe
that we are manipulating and examining at that moment. We use a
heuristic device to find a particular distribution - the one that
is at (say) the 97 1/2 - 2 1/2 percent boundary - and simply
state explicitly what the distribution tells us implicitly: The
probability of this distribution generating the observed sample
(or a sample even further removed) is 2 1/2 percent. We could go
on to say (if it were of interest to us at the moment) that
because the probability of this universe generating the observed
sample is as low as it is, we "reject" the "hypothesis" that the
sample came from a universe this far away or further. Or in
other words, we could say that because we would be very surprised
if the sample were to have come from this universe, we instead
believe that another hypothesis is true. The "other" hypothesis
often is that the universe that generated the sample has a mean
located at the sample mean or closer to it than the boundary
universe.
The behavior of the universe at the 97 1/2 - 2 1/2 percent
boundary line can also be interpreted in terms of our
"confidence" about the location of the mean of the universe that
generated the observed sample. We can say: At this boundary
point lies the end of the region within which we would bet 97 1/2
to 2 1/2 that the mean of the universe that generated this sample
lies to the (say) right of it.
As noted in the preview to this chapter, we do not learn
about the reliability of sample estimates of the population mean
(and other parameters) by logical inference from any one
particular sample to any one particular universe, because in
principle this cannot be done. Instead, in this second approach
we investigate the behavior of various universes at the
borderline of the neighborhood of the sample, the characteristics
of those universes being chosen on the basis of their
resemblances to the sample. We seek, for example, to find the
universes that would produce samples with the mean of the
observed sample less than (say) 5 percent of the time. In this
way the estimation of confidence intervals is like all other
statistical inference: One investigates the probabilistic
behavior of hypothesized universes, the hypotheses being
implicitly suggested by the sample evidence but not logically
implied by that evidence.
Approaches 1 and 2 may (if one chooses) be seen as identical
conceptually as well as (in many cases) computationally. But as
I see it, the interpretation of them is rather different, and
distinguishing them helps one's intuitive understanding.
Approach 3: A Simulation Method
Here is another new method: We can simulate the behavior of
a variety of universes at different distances from us.
As one thinks about the concept of confidence interval, it
turns out to be either very hard or impossible to get a clear
idea of what others are talking about or the meaning of the
mathematical operations they perform in connection with that
concept - as shown in the quotes from various skeptical
statisticians above. To clarify the matter, and also as a
practical expedient, I propose a way of defining confidence
intervals - or a concept that grasps some of that idea, which we
might call an accuracy interval - that has clarified many other
difficult concepts (e.g. relativity), but so far as I can tell,
has not been employed with confidence intervals: operational
definition.
To use a the physical example of estimating the accuracy of
the estimate of the location of the trunk of an apple tree to
illustrate the logic: We may base our estimate of the spread of
the fall of apples from apple trees on the actual sample that we
have, and then examine how often a sample of, say, ten apples
from such a tree would have a mean as far to the right as we are
standing. And this is indeed how we can proceed -- trying out
simulated trees at differing distances.
These are the operational steps I suggest that one would
perform to compute a confidence-like accuracy interval in a
particular simple case:
1. Mark off the narrowest area of the observed sample
distribution that contains 10 percent of the probability density;
in the case of a symmetrical distribution, this would be on both
sides of the mean. Let us call this area the "target zone".
2. Locate points of width similar to the target zone on the
horizontal axis extending to the left and right of the target
zone without bound (assuming that the distribution is two-
dimensional). At each of these points, including the middle
point of the target zone, locate a bootstrap universe constructed
on the model of the observed sample.
3. By simulation (or some analytic device), produce from
the middle (target zone) universe, (say) 100 means of samples
size n (n being the observed sample size).
4. Mark those means that fall within and those outside the
target zone.
5. Repeat steps 3 and 4 for the first such universes to the
left and right of the sample, and then for other universes to the
left and right until they are so far away that they are not
putting any noticeable number of means in the sample zone.
6. Count the total means in the sample zone; ignore all
others.
7. Array all these means according to the universes from
which they came.
8. Start at the middle point, and continue outward until
the universes between the center and that point account for (say)
95 percent of the means within the sample zone. Mark that point.
9. The marked point constitutes the interval containing 95
percent of the universes that might have given rise to the
observed sample. One can then say that there is a 95 percent
probability that the observed sample came from a universe with a
mean within that interval.
It is all-important for this procedure, of course, that the
distribution of universes is assumed to be horizontal. But we
have not had to make any assumptions about that shapes of the
universe(s), not even that they (it) be symmetrical.
This third approach has not yet been developed in practice.
But the very exercise of thinking it through illuminates the
issues involved in constructing conventional confidence intervals
or the boundary intervals described in Approach 2.
CONFIDENCE INTERVALS AND BAYESIAN ANALYSIS
Bayesian thinking can often be valuable in constructing
confidence intervals. If one states one's prior beliefs about
the distribution of the parameter in question, and then combines
that distribution with the observed data, there is nothing
mysterious or ambiguous about stating the posterior distribution
of belief, which can then be considered as the stuff of a
confidence interval. Therefore, Bayesian analyis can serve well
to shine clear sunlight on this murky concept. And even if one
wishes to state an extremely "uninformative" prior distribution -
that is, a state of affairs when one asserts close to no
knowledge at all - the Bayesian procedure is admirably clear and
consistent, pulling no rabbits from a hat. An illustration
(using data from Box and Tiao) may be found in Chapter 00.
One need not even do anything differently than standard
confidence-interval calculations, to get the benefit of Bayesian
analysis. One may simply interpret the results in the Bayesian
fashion so as to obtain meaningful statements.
CONCLUSION
It is not possible in principle to derive a probability
statement about the location of the mean or any other parameter
of a distribution from a set of data alone, without additional
assumptions.
One can make unambiguous statements about the probability
that any specified distribution, at any given distance from the
mean of a sample, would produce a sample of the observed size
with a mean located as far or further away from the hypothetical
universe's mean as is the observed sample's mean.
With various Bayesian-type assumptions, one can make
probability statements about the location of the mean of the
universe that produced the sample.
One can make a simulation with a linear Bayesian prior
distribution (or some other prior) that will allow one to make
probability statements about the location of the mean of the
universe that produced the sample.
Whether one wishes to refer to either of the above two
procedures as "confidence interval" is a matter of choice.
AFTERNOTE: ABOUT THE INFINITE REGRESS PROBLEM
This afternote expands on an earlier footnote about Savage's
objection to confidence intervals on the grounds that they
constitute an infinite regress.
Even the next level regression in the sequence that Savage
mentions cannot be an important difficulty in practice. If one
cares to do so, one may estimate the accuracy of the confidence
limits by (in the resampling approach) repeating the overall
simulation, and observing the variation in the confidence bounds.
If one does this and looks at the 95 percent bounds around the
confidence bounds, they are huge - so large as to be without
meaning in the cases of proportions that Peter Bruce and I have
looked at. But this is only surprising until one thinks about
it; such large variation is inevitable given that the result is
something like a .052 probability.
The exploration in the paragraph above leads back to the
question of why confidence limits tend to focus on the same 95
percent and 99.7 percent values as found in classical hypothesis
testing. Those values were selected long ago for hypothesis
testing because they seem to be intuitive measures of the
relevant psychological surprise. And for purposes other than
measures of surpise - that is, more directly related to decision-
making - hypothesis testing now more frequently (and more
sensibly, in my view) looks at the prob-value result itself. But
this more flexible prob-value concept does not fit comfortably
with confidence intervals.
When thought through from scratch, perhaps more sensible
confidence values would be 50 percent, or 75 percent, rather
thasn 95 percent - which would be closer to the concept
traditionally used in physical experiments as a rough plus-or-
minus index of reliability and error. The 50 percent bounds on
50 percent confidence limits might then be a meaningful second
order measure.
As to further regressions - any sensible person stops being
concerned with a further order of smalls at some point; one could
never live through a day without such approximations. To worry
about it is to seek impossible perfection.
ENDNOTES
**FOOTNOTES**
[1]:Savage is troubled by the infinite regress in connection
with the estimate of dispersion. "Taking the doctrine literally,
it evidently leads to endless regression, for an estimate of the
accuracy of an estimate should presumably be accompanied by an
estimate of its own accuracy, and so on forever" (p. 257). But
if we simply define "accuracy" operationally as the calculations
in the approaches discussed below, this difficulty disappears.
Savage might say that I have just defined away the difficulty.
I'd answer: Yes indeed. It is the highest function of
operational definitions such as this one to get us around logical
traps and enable us to function with usable tools.
This issue is discussed further in the Afternote to the
chapter.
[1]: Though the logic of confidence intervals is not
only subtle but also rests on shakier ground, in my judgment,
than that of hypothesis testing, there are thoughtful and
respected statisticians - for example, Thomas Wonnacott - who
argue that the logic of confidence intervals is better
grounded and leads less often to error.
[1]: An example of this sort of interpretation is as follows:
... Although on average X-bar is on target, the
specific sample mean X-bar that we happen to observe is
almost certain to be a bit high or a bit low.
Accordingly, if we want to be reasonably confident that
our inference is correct, we cannot claim that mu is
precisely equal to the observed X-bar. Instead, we
must construct an interval estimate or confidence
interval of the form:
mu = X-bar + sampling error
The crucial question is: How wide must this allowance for
sampling error be? The answer, of course, will depend on
how much X-bar fluctuates...
Constructing 95% confidence intervals is like
pitching horseshoes. In each case there is a fixed
target, either the population mu or the stake. We are
trying to bracket it with some chancy device, either
the random interval or the horseshoe...
There are two important ways, however, that confidence
intervals differ from pitching horseshoes. First, only
one confidence interval is customarily constructed.
Second, the target mu is not visible like a horseshoe
stake. Thus, whereas the horseshoe player always knows
the score (and specifically, whether or not the last
toss bracketed the stake), the statistician does not.
He continues to "throw in the dark," without knowing
whether or not a specific interval estimate has
bracketed mu. All he has to go on is the statistical
theory that assures him that, in the long run, he will
succeed 95% of the time. (Wonnacott and Wonnacott,
1990, p. 258).
Savage refers to this type of interpretation as follows:
...is a sort of fiction; for it will be found that
whenever its advocates talk of making assertions that
have high probability, whether in connection with
testing or estimation, they do not actually make such
assertions themselves, but endlessly pass the buck,
saying in effect, "This assertion has arisen according
to a system that will seldom lead you to make false
assertions, if you adopt it. As for myself, I assert
nothing but the properties of the system."(1972, pp.
260-261)
Lee writes at greater length:[where else is quote below?]
[T]he statement that a 95% confidence interval for an
unknown parameter ran from -2 to +2 sounded as if the
parameter lay in that interval with 95% probability and
yet I was warned that all I could say was that if I
carried out similar procedures time after time then the
unknown parameters would lie in the confidence
intervals I constructed 95% of the time.
Subsequently, I discovered that the whole theory
had been worked out in very considerable detail in such
books as Lehmann (1959, 1986). But attempts such as
those that Lehmann describes to put everything on a
firm foundation raised even more questions. (Lee,
1989, p. vii)
[1]: Efron and Tibshirani (1993, p. 157) suggest an
approach that is computationally like Approach 2, but they
interpret the computation differently and and refer to it as a
confidence interval. They also say that the approach applies
only to a Normal distribution, whereas I see no reason for such
a restriction.
[2]: Peter Bruce has convinced me that a goodly number of
distributions would result in asymetric confidence intervals.
This can cause considerable complications for the conventional
formulaic calculations, though resampling handles them nicely.
The interpretation requires a longer statement than otherwise,
however.
[1]: You can consider this one throw as a sample of one,
with that throw as the mean observation, if the prior discussion
of sample means would otherwise lead you to question this
example.
[1]: More about this later; it is, as I said earlier, not of
primary importance in estimating the accuracy of the confidence
intervals; note, please, that as we talk about the accuracy of
statements about accuracy, we are moving down the ladder of
sizes of causes of error.
[1]: When working with proportions, the conventional
method must obtain these points from prepared ellipses and
binomial tables, not from the sort of geometric trick used in
the previous paragraphs, and hence showing the distribution
centered at xbar = mu is quite misleading.
**ENDNOTES**
<1>: Peter Bruce's help in clarifying the ideas in this
chapter by discussing them with me, along with teaching them
jointly with me, has been especially great.
page 2 \statphil Chapter II-3 statconf 4-23-9623