Approach 3:  A Bayesian Approach

             Approach 2 is in the Bayesian spirit in that it asks about

        probabilities of the observed data conditional upon one or more

        particular universes.  This has the virtue of being quite

        unambiguous in interpretation.  But one can go even further in

        this direction, as follows:

             Mark off a set of universes on each side of the sample mean,

        with centroids at equal distances from each other.  Then perform

        the same operations for each that are specified for the universes

        mentioned in Approaches 1 and 2, and normalize the results.  If

        the prior distribution is assumed to be uniform, the results will

        be the same as the standard confidence interval.  But the

        interpretation will be different.  There will be no attempt to

        make any statement about the unconditional probability of the

        mean of the universe.  Rather, the result will be a statement

        about the mean of the universe conditional upon a uniform prior

        distribution and the sample evidence, which is unchallengeable

        logically.

             If the prior distribution is not uniform, appropriate

        adjustment can be made when normalizing, and again, the

        interpretation is not subject to question, though the assumption

        about the prior may be questioned.

             The problem of computation has always been a barrier to this

        sort of Bayesian interpretation.  But if one simulates Bayesian

        probabilities, the difficult disppears.  Here follows an example

        of such simulation.  A confidence interval may immediately be

        read from the posterior distribution in standard Bayesian 
        fashion. INSERT FROM STATSWRK3 BAYESNRM


            Figure II-3--1 W and W fig 8-4 (from Clopper and Pearson)


        From Encyclopedia of Statistics, "Confidence Interfals and

        Regions", pp. 120-121.

              ...we can make probability statements about X; e.g.,


                  Pr[mu - 1.96sigma _< X _<  mu + 1.96 sigma] = 0.95.

                                                                (1)              We could rewrite this as

             Pr[X - 1.96 sigma _<  mu _<  X + 1.96 sigma] = 0.95

                                                                (2)              or

             Pr[mu _E (X - 1.96 sigma, X + 1.96 sigma)] = 0.95.                                                                 (3)

             Although mu may appear to be the subject of statements (2)

             and (3), the probability distribution referred to is that of

             X, as was more obvious in statement (1).               If X is observed to be x, we say that we have 95% confidence

             that x - 1.96 sigma _<  mu _<  x + 1.96 sigma or say that (x

             - 1.96 sigma,x + 1.96 sigma) is a 95% confidence interval

             for mu.  No probability statement is made about the

             proposition

                       x - 1.96 sigma _<  mu _<  x + 1.96 sigma  (4)

             involving the observed value, x, since neither x nor mu has

             a probability distribution.  The proposition (4) will be

             either true or false, but we do not know which.  If

             confidence intervals with confidence coefficient p were

             computed on a large number of occasions, then, in the long

             run, the fraction p of these confidence intervals would

             contain the true parameter value.  (This is provided that

             the occasions are independent and that there is no selection

             of cases.)

             "Confidence Intervals and Regions," ------, pp. 120, 121.  

        page 1 \statphil Chapter II-3 statconf 4-23-9623



                                   CHAPTER II-3

            POINT ESTIMATION AND CONFIDENCE INTERVALS I: THE LOGIC<1>


             This chapter discusses how to assess the accuracy of a point

        estimate of the mean, median, or other statistic of a sample.  We

        want to know:  How close is our estimate of (say) the sample mean

        likely to be to the population mean?  It is all very well to say

        that on average the sample mean (or other point estimator) equals

        a population parameter.  But what about the result of any

        particular sample?  How accurate or inaccurate an estimate is it

        likely to produce?  Early in the history of statistical inference

        this question arose in the practice of astronomy (see Stigler,

        1986; Hald, 1990).

             The accuracy of an estimate is a hard intellectual nut to

        crack, so hard that for hundreds of years statisticians and

        scientists wrestled with the problem with little success; it was

        not until the last century or two that much progress was made.

        The kernel of the problem is learning the extent of the variation

        in the population.  But whereas the sample mean can be used

        straightforwardly to estimate the population mean, the extent of

        variation in the sample does not directly estimate the extent of

        the variation in the population, because the variation differs at

        different places in the distribution, and there is no reason to

        expect it to be symmetrical around the estimate or the mean.

             The intellectual difficulty of confidence intervals may be

        one reason why they are less prominent in statistics literature

        and practice than are tests of hypotheses (though statisticians

        often favor confidence intervals).  Another reason is that tests

        of hypotheses are more fundamental for pure science because they

        address the question that is at the heart of all knowledge-get-

        ting:  "Should these groups be considered different or the same?"

        The statistical inference represented by confidence limits ad-

        dresses what seems to be a secondary question in most sciences

        (though not in astronomy or perhaps physics):  "How reliable is

        the estimate?"  Still, confidence intervals are very important in

        some applied sciences such as geology - estimating the variation

        in grades of ores, for example - and in some parts of business

        and industry.

             Confidence intervals and hypothesis tests are not disjoint

        ideas.  Indeed, hypothesis testing of a single sample against a

        benchmark value is (in all schools of thought, I believe) opera-

        tionally identical with the most common way (Approach 1 below) of

        constructing a confidence interval and checking whether it in-

        cludes that benchmark value.  But the underlying reasoning is

        different for confidence limits and hypothesis tests.

             The logic of confidence intervals is on shakier ground, in

        my judgment, than that of hypothesis testing, though there are

        many thoughtful and respected statisticians who argue that the

        logic of confidence intervals is better grounded and leads less

        often to error.

             Confidence intervals are considered by many to be part of

        the same topic as estimation - being an estimation of accuracy,

        in their view.  And confidence intervals and hypothesis testing

        are seen as sub-cases of each other by some people.  Whatever the

        importance of these distinctions among these intellectual tasks

        in other contexts, they need not concern us here.

             Confidence intervals - even if they are meaningful -

        certainly are controversial.  The Encyclopedia of Statistics

        says:  "Confidence intervals are widely used in practice,

        although not as widely supported by people interested in the

        foundatrions of statistics" (Vol 2, p. 126).   Some statisticians

        will not even discuss the topic. For example, the index to the

        well-respected book on Basic Concepts of Probability and

        Statistics by Hodges and Lehman (1970) does not even have a

        listing for confidence intervals.  And Savage in his The

        Foundations of Statistics (1954/1972) first says that "The

        doctrine of accuracy estimation is vague" (p. 257) and then later

        writes the same phrase except with "erroneous" instead of "vague"

        (p. 260).  He goes on to say that "not being convinced myself, I

        am in no position to present convincing evidence for the

        usefulness of interval estimation" (p. 261).  He describes

        Fisher's approach to the matter, fiducial probability, as the

        "most disputed technical concept of modern statistics" (p. 262),

        and says no more.  He also refers to the supposedly-related idea

        of tolerance intervals as "slippery", so it is not surprising if

        the layperson finds the entire matter slippery. [1]

             One  thing  is  undeniable, however:  Despite the difficulty

        subtlety of the topic,  the  accuracy  of estimates must be dealt

        with, one way or another.

             Philosophers seldom write about the subject, with the

        notable exception of Braithwaite (1953), who bases his treatment

        on Neyman and Pearson; his treatment is adventurous, yet

        sufficiently obscure to tax anyone's understanding.

             Because the logic of confidence intervals is subtle, most

        statistics texts skim right past the conceptual difficulties, and

        go directly to computation.  And when the concept is combined

        with the conventional algebraic treatment, the composite is truly

        baffling; the formal mathematics makes impossible any intuitive

        understanding.  For students, "pluginski" is the only viable

        option.

             With the resampling method, however, the mathematics of

        confidence intervals is easy.  The statistical interpretation of

        the calculations then becomes a challenging and even pleasurable

        subject; even beginning undergraduates can enjoy the subtlety and

        find that it feels good to stretch the brain and get down to

        fundamentals, once the calculations become transparent.

             To preview the treatment of confidence intervals presented

        below, which I hope dissolves the confusion of the topic:  We do

        not learn about the reliability of sample estimates of the mean

        (and other parameters) by logical inference from any one

        particular sample to any one particular universe, because this

        cannot be done in principle.  Instead, we investigate the

        behavior of various universes in the neighborhood of the sample,

        universes whose characteristics chosen on the basis of their

        resemblances to the sample.  In this way the estimation of

        confidence intervals is like all other statistical inference:

        One investigates the probabilistic behavior of one or more

        hypothesized universes, the hypotheses being implicitly suggested

        by the sample evidence but not logically implied by that

        evidence.

             The examples worked through below show why statistics is as

        difficult a subject as it is.  The procedure required to transit

        successfully from the original question to the statistical

        probability and then interpretation of the probability involves a

        great many choices about the appropriate model based on analysis

        of the problem at hand; a wrong choice at any point dooms the

        procedure.  The actual computation of the probability - whether

        done with formulaic probability theory or with resampling - is

        only a very small part of the procedure, and it is the least

        difficult part if one proceeds with resampling.  The difficulties

        in the statistical process are not mathematical but rather stem

        from the hard clear thinking needed to understand the nature of

        the situation and to ascertain the appropriate way to model it.

             In comparison with the logic of hypothesis testing, the

        logic of confidence limits is more subtle (though it need not be

        as opaque as one would think it is from reading the philosophic

        and statistical literature (e.g. Braithwaite, 1953;  Gigerenzer

        et. al., 1989).  The difference, I think, is that in hypothesis-

        testing situations we find it relatively easy to decide which

        universes we wish to analyse, and therefore the deductive chain

        from the statistical question to the probabilistic study is

        short, clear, and strong.  But when inquiring into the accuracy

        of estimations we find it much harder to decide which universes

        we wish to analyse. This is the core of the problem, and climbing

        up the deductive chain is therefore fraught with difficulty.[1]


                        THE LOGIC OF CONFIDENCE INTERVALS

             The purpose of a confidence interval is to help us assess

        the reliability of one or more parameters of the sample - most

        often its mean or median - as an estimator of the parameter of

        the universe.

             If one draws a sample that is very very large - large enough

        so that one need not worry about sample size and dispersion in

        the case at hand, from a universe whose characteristics one

        knows, one then can deduce the probability that the sample mean

        will fall within a given distance of the population mean.  Intui-

        tively, it seems as if one should also be able to reverse the

        process - to infer something about the location of the population

        mean from the sample mean.  But this inverse inference turns out

        to be a slippery business indeed.              Let's put it differently:  It is all very well to say - as

        one logically may - that on average the sample mean (or other

        point estimator) equals a population parameter in most situa-

        tions.  But what about the result of any particular sample?  How

        accurate or inaccurate an estimate of the population mean is the

        sample likely to produce?

             The line of thought runs as follows:  It is possible to map

        the distribution of the means (or other such parameter) of

        samples of any given size (the sample size of interest in any

        investigation usually being the size of the observed sample) and

        of any given pattern of dispersion (which we will assume for now

        can be estimated from the sample) that a universe in the

        neighborhood of the sample will produce.  For example, we can

        compute how big an interval to the right of a postulated

        universe's mean will include 45 percent of the samples on one

        side of the mean and 45 percent on the other side.

             What cannot be done is to draw conclusions from sample

        evidence about the nature of the universe from which it was

        drawn, in the absence of some information about the set of uni-

        verses from which it might have been drawn.  That is, one can

        investigate the behavior of one or more specified universes, and

        discover the absolute and relative likelihoods that the given

        specified universe(s) might produce such a sample.  But the

        universe(s) to be so investigated must be specified in advance

        (which is consistent with the Bayesian view of statistics).  To

        put it differently, we can employ probability theory to learn the

        pattern(s) of results produced by samples drawn from a particular

        specified universe, and then compare that pattern to the observed

        sample. But we cannot infer the probability that that sample was

        drawn from any given universe in the absence of knowledge of the

        other possible sources of the sample.  That is a subtle differ-

        ence, but hopefully the following discussion makes it

        understandable.


                          COMPUTING CONFIDENCE INTERVALS

             In the first part of the discussion we shall leave aside the

        issue of estimating the extent of the dispersion - a troublesome

        matter, but one which seldom will result in unsound conclusions

        even if handled crudely.

             To start from scratch again:  The first - and seemingly

        straightforward - step is to estimate the mean of the population

        based on the sample data.  The next and more complex step is to

        ask about the range of values (and their probabilities) that the

        estimate of the mean might take - that is, the construction of

        confidence intervals.  It seems natural to assume that if our

        best guess about the population mean is the value of the sample

        mean, our best guesses about the various values that the

        population mean might take if unbiased sampling error causes

        discrepancies between population parameters and sample

        statistics, should be values clustering around the sample mean in

        a symmetrical fashion (assuming that asymmetry is not forced by

        the distribution - as for example, the binomial is close to

        symmetric near its middle values).  But how far away from the

        sample mean might the population mean be?

             Let's walk slowly through the logic, going back to basics to

        enhance intuition.  Let's start with the familiar saying, "The

        apple doesn't fall far from the tree."  Imagine that you are in a

        very hypothetical place where an apple tree is above you, and you

        are not allowed to look up at the tree, whose trunk has an

        infinitely thin diameter.  You see an apple on the ground.  You

        must now guess where the trunk (center) of the tree is.  The

        obvious guess for the location of the trunk is right above the

        apple.  But the trunk is not likely to be exactly above the apple

        because of the small probability of the trunk being at any

        particular location, due to sampling dispersion.

             Though you find it easy to make a best guess about where the

        mean is (the true trunk), with the given information alone you

        have no way of making an estimate of the probability that the

        mean is one place or another, other than that the probability is

        the same that the tree is to the north or south, east or west, of

        you.  You have no idea about how far the center of the tree is

        from you.  You cannot even put a maximum on the distance it is

        from you, and without a maximum you could not even reasonably

        assume a rectangular distribution, or a Normal distribution, or

        any other.

             Next you see two apples.  What guesses do you make now?  The

        midpoint between the two obviously is your best guess about the

        location of the center of the tree.  But still there is no way to

        estimate the probability distribution of the location of the

        center of the tree.

              Now assume you are given still another piece of

        information: The outermost spread of the tree's branches (the

        range) equals the distance between the two apples you see.  With

        this information, you could immediately locate the boundaries of

        the location of the center of the tree.  But this is only because

        the answer you sought was given to you in disguised form.

             You could, however, come up with some statements of relative

        probabilities.  In the absence of prior information on where the

        tree might be, you would offer higher odds that the center (the

        trunk) is in any unit of area close to the center of your two

        apples than in a unit of area far from the center.  That is, if

        you are told that either one apple, or two apples, came from one

        of two specified trees whose locations are given, with no reason

        to believe it is one tree or the other (later, we can put other

        prior probabilities on the two trees), and you are also told the

        dispersions, you now can put relative probabilities on one tree

        or the other being the source.  (This is like the Neyman-Pearson

        procedure, and it is easily reconciled with the Bayesian point of

        view to be explored later.  One can also connect this concept of

        relative probability to the Fisherian concept of maximum likeli-

        hood - which is a likelihood relative to all others).  And you

        could list from high to low the probabilities for each unit of

        area in the neighborhood of your apple sample.  But this proce-

        dure is quite different from making any single absolute numerical

        probability estimate of the location of the mean.

             Now let's say you see 10 apples on the ground.  Of course

        your best estimate is that the trunk of the tree is at their

        arithmetic center.  But how close to the actual tree trunk (the

        population mean) is your estimate likely to be?  This is the

        question involved in confidence intervals.  We want to estimate a

        range (around the center, which we estimate with the center mean

        of the sample, we said) within which we are pretty sure that the

        trunk lies.

             To simplify, we consider variation along only one dimension

        - that is, on (say) a north-side line rather than on two (the

        entire surface).

             We first note that you have no reason to estimate the

        trunk's location to be outside the sample pattern, or at its

        edge, though it could be so in principle.

             If the pattern of the 10 apples is tight, you imagine the

        pattern of the likely locations of the population mean to be

        tight; if not, not.  That is, it is intuitively clear that there

        is some connection between how spread out are the sample

        observations and your confidence about the location of the

        population mean.  For example, consider two patterns of a

        thousand apples, one with twice the spread of another, where we

        measure spread by (say) the diameter of the circle that holds the

        inner half of the apples for each tree, or by the standard

        deviation.  It makes sense that if the two patterns have the same

        center point (mean), you would put higher odds on the tree with

        the smaller spread being within some given distance - say, a foot

        - of the estimated mean.  But what odds would you give on that

        bet?



              THE TWO APPROACHES TO ESTIMATING CONFIDENCE INTERVALS

             There are two broad conceptual approaches to the question at

        hand:  1) Study the probability of various distances between the

        sample mean and the likeliest population mean; and 2) study the

        behavior of particular border universes.  Computationally, both

        approaches often yield the same result, but their interpretations

        differ.  Approach 1 follows the conventional logic although

        carrying out the calculations with resampling simulation.


        Approach 1:  The Conventional Logic for a Confidence Interval:

        The Distance Between Sample and Population Mean

             If the study of probability can tell us the likelihood that

        a given population will produce a sample with a mean at a given

        distance x from the population mean, and if a sample is an

        unbiased estimator of the population, then it seems natural to

        turn the matter around and interpret the same sort of data as

        telling us the probability that the estimate of the population

        mean is that far from the "actual" population mean.  A fly in the

        ointment is our lack of knowledge of the dispersion, but we can

        safely put that aside for now.  (See below, however).

             This first approach begins by assuming that the universe

        that actually produced the sample has the same amount of

        dispersion (but not necessarily the same mean) that one would

        estimate from the sample.  One then produces (either with

        resampling or with Normal distribution theory) the distribution

        of sample means that would occur with repeated sampling from that

        designated universe with samples the size of the observed sample.

        One can then compute the distance between the (assumed)

        population mean and (say) the inner 45 percent of sample means on

        each side of the actually-observed sample mean.

             The crucial step is to shift vantage points.  We look from

        the sample to the universe, instead of from a hypothesized

        universe to simulated samples (as we have down so far). This same

        interval as computed above must be the relevant distance as when

        one looks from the sample to the universe.  Putting this

        algebraically, we can state (on the basis of either simulation or

        formal calculation) that for any given population S, and for any

        given distance d from its mean mu, that

                           p[(mu - xbar) < d] = alpha,

        where xbar is a randomly-generated sample mean and alpha is the

        probability resulting from the simulation or calculation.

             The above equation focuses on the deviation of various

        sample means (xbar) from a stated population mean (mu).  But we

        are logically entitled to read the algebra in another fashion,

        focusing on the deviation of mu from a randomly-generated sample

        mean.  This implies that for any given randomly-generated sample

        mean we observe, the same probability (alpha) describes the

        probability that mu will be at a distance d or less from the

        observed xbar.  (I believe that this is the logic underlying the

        conventional view of confidence intervals, but I have yet to find

        a clear-cut statement of it; in any case, it appears to be

        logically correct.)

             To repeat this difficult idea in slightly different words:

        If one draws a sample (large enough to not worry about sample

        size and dispersion), one can say in advance that there is a

        probability p that the sample mean (xbar) will fall within z

        standard deviations of the population mean (mu).  One estimates

        the population dispersion from the sample.  If there is a

        probability p that xbar is within z standard deviations of mu,

        then with probability p, mu must then be within that same z

        standard deviations of xbar.  To repeat, this is, I believe, the

        heart of the standard concept of the confidence interval, to the

        extent that there is thought-through consensus on the matter.

             So we can state for such populations the probability that

        the distance between the population and sample means will be d or

        less.  Or with respect to a given distance, we can say that the

        probability that the population and sample means will be that

        close together is p.

             That is, we start by focusing on how much the sample mean

        diverges from the known population mean.  But then - and to

        repeat once more this key conceptual step - we re-focus our

        attention to begin with the sample mean and then discuss the

        probability that the population mean will be within a given

        distance.  The resulting distance is what we call the "confidence

        interval".

             Please notice that the distribution (universe) assumed at

        the beginning of this approach did not include the assumption

        that the distribution is centered on the sample mean or anywhere

        else.  It is true that the sample mean is used for purposes of

        reporting the location of the estimated universe mean.  But

        despite how the subject is treated in the conventional approach,

        the estimated population mean is not part of the work of

        constructing confidence intervals.  Rather, the calculations

        apply in the same way to all universes in the neighborhood of the

        sample (which are assumed, for the purpose of the work, to have

        the same dispersion).  And indeed, it must be so, because the

        probability that the universe from which the sample was drawn is

        centered exactly at the sample mean is very small.

             This independence of the confidence-intervals construction

        from the mean of the sample (and the mean of the estimated

        universe) is surprising at first, but after a bit of thought it

        makes sense.

             In this first approach, as noted more generally above, we do

        not make estimates of the confidence intervals on the basis of

        any logical inference from any one particular sample to any one

        particular universe, because this cannot be done in principle; it

        is the futile search for this connection that for decades roiled

        the brains of so many statisticians and now continues to trouble

        the minds of so many students.  Instead, we investigate the

        behavior of (in this first approach) the universe that has a

        higher probability of producing the observed sample than does any

        other universe (in the absence of any additional evidence to the

        contrary), and whose characteristics are chosen on the basis of

        its resemblance to the sample.  In this way the estimation of

        confidence intervals is like all other statistical inference:

        One investigates the probabilistic behavior of one or more

        hypothesized universes, the universe(s) being implicitly

        suggested by the sample evidence but not logically implied by

        that evidence.  And there are no grounds for dispute about

        exactly what is being done - only about how to interpret the

        results.

             One difficulty with the above approach is that the estimate

        of the population dispersion does not rest on sound foundations;

        this matter will be discussed later, but it is not likely to lead

        to a seriously misleading conclusion.

             A second difficulty with this approach is in interpreting

        the result.  What is the justification for focusing our attention

        on a universe centered on the sample mean?  While this particular

        universe may be more likely than any other, it undoubtedly has a

        low probability.  And indeed, the statement of the confidence

        intervals refers to the probabilities that the sample has come

        from universes other than the universe centered at the sample

        mean, and quite a distance from it.

             My answer to this question does not rest on a set of

        meaningful mathematical axioms, and I assert that a meaningful

        axiomatic answer is impossible in principle.  Rather, I reason

        that we should consider the behavior of this universe because

        other universes near it will produce much the same results,

        differing only in dispersion from this one, and this difference

        is not likely to be crucial; this last assumption is all-

        important, of course.  True, we do not know what the dispersion

        might be for the "true" universe.  But elsewhere (Chapter 00 in

        [Statphil]) I argue that the concept of the "true universe" is

        not helpful - or maybe even worse than nothing - and should be

        forsworn.  And we can postulate a dispersion for any other

        universe we choose to investigate.  That is, for this postulation

        we unabashedly bring in any other knowledge we may have.  The

        defense for such an almost-arbitrary move would be that this is a

        second-order matter relative to the location of the estimated

        universe mean, and therefore it is not likely to lead to serious

        error.  (This sort of approximative guessing sticks in the

        throats of many trained mathematicians, of course, who want to

        feel an unbroken logic leading backwards into the mists of axiom

        formation.  But the axioms themselves inevitably are chosen

        arbitrarily just as there is arbitrariness in the practice at

        hand, though the choice process for axioms is less obvious and

        more hallowed by having been done by the masterminds of the past.

        (See Chapter 00 in [Statphil] on the necessity for judgment.)

        The absence of a sequence of equations leading from some first

        principles to the procedure described in the paragraph above is

        evidence of what is felt to be missing by those who crave logical

        justification.  The key equation in this approach is formally

        unassailable, but it seems to come from nowhere.)

             In the examples in the following chapter  may be found

        computations for two population distributions - one binomial and

        one quantitative - of the histograms of the sample means produced

        with this procedure.

             Operationally, we use the observed sample mean, together

        with an estimate of the dispersion from the sample, to estimate a

        mean and dispersion for the population.  Then with reference to

        the sample mean we state a combination of a distance (on each

        side) and a probability pertaining to the population mean.  The

        computational examples will illustrate this procedure.

             Once we have obtained a numerical answer, we must decide how

        to interpret it.  There is a natural and almost irresistible

        tendency to talk about the probability that the mean of the

        universe lies within the intervals, but this has proven confusing

        and controversial.  Interpretation in terms of a repeated process

        is not very satisfying intuitively [1].  In my view, it is not

        worth arguing about any "true"  interpretation of these

        computations.  One could sensibly interpret the computations in

        terms of the odds a decision-maker, given the evidence, would

        reasonably offer about the relative likelihoods that the sample

        came from one of two specified universes (one of them probably

        being centered on the sample); this does provide some information

        on reliability, but this procedure departs from the concept of

        confidence intervals.

             The reader may find it useful to read in the next chapter

        examples of the actual practice of computing confidence intervals

        in Approach 1, before proceeding to read about Approach 2.


        Approach 2.  A Relevant Method Though Not a Confidence Interval:

        Likelihood of Various Universes Producing This Sample

             There is another simple method for getting an impression of

        the location of the sample with respect to the universe that

        generated it; it is not the same as a confidence interval[1], but

        it can be illuminating.  We can simply pick any particular

        location and state the probability that a given universe located

        at that point would produce a sample with a mean as far or

        farther away than the observed sample.  This method does not

        require any assumptions about the locations of universes.  But it

        clearly does not allow one to state a probability that the sample

        came from any particular universe or set of universes within any

        particular interval.

             The second approach to the general question of estimate

        accuracy is to analyze the behavior of a variety of universes

        centered at other points on the line, rather than the universe

        centered on the sample mean.  One can ask the probability that a

        distribution centered away from the sample mean, with a given

        dispersion, would produce (say) a 10-apple scatter having a mean

        as far away from the given point as the observed sample mean.  If

        we assume the situation to be symmetric 1[2], we can find a point

        at which we can say that a distribution centered there would have

        only a (say) 5 percent chance of producing the observed sample.

        And we can also say that a distribution even further away from

        the sample mean would have an even lower probability of producing

        the given sample.  But we cannot turn the matter around and say

        that there is any particular chance that the distribution that

        actually produced the observed sample is between that point and

        the center of the sample.

             Imagine a situation where you are standing on one side of a

        canyon, and you are hit by a baseball, the only ball in the

        vicinity that day.  Based on experiments, you can estimate that a

        baseball thrower who you see standing on the other side of the

        canyon has only a 5 percent chance of hitting you with a single

        throw [1].  But this does not imply that the source of the ball

        that hit you was someone else standing in the middle of the

        canyon, because that is patently impossible.  That is, your

        knowledge about the behavior of the "boundary" universe does not

        logically imply anything about the existence and behavior of any

        other universes.  But just as in the discussion of testing

        hypotheses, if you know that one possibility is unlikely, it is

        reasonable that as a result you will draw conclusions about other

        possibilities in the context of your general knowledge and

        judgment.

             We can find the "boundary" distribution(s) we seek if we a)

        specify a measure of dispersion, and b) try every point along the

        line leading away from the sample mean, until we find that

        distribution that produces samples such as that observed with a

        (say) 5 percent probability or less.

             To estimate the dispersion, in many cases we can safely use

        an estimate based on the sample dispersion, using either

        resampling or Normal distribution theory.  The hardest cases for

        resampling are a) a proportion near 0.1 or 1.0, and b) a very

        small sample of data.  In such situations one should use

        additional outside information, or Normal distribution theory, or

        both.

             We can also create a confidence interval in the following

        fashion:  We can first estimate the dispersion for a universe in

        the general neighborhood of the sample mean, using various

        devices to be "conservative", if we like.[1] Given the estimated

        dispersion, we then estimate the probability distribution of

        various amounts of error between observed sample means and the

        population mean.  We can do this with resampling simulation as

        follows:  a) Create other universes at various distances from the

        sample mean, but with other characteristics similar to the

        universe that we postulate for the immediate neighborhood of the

        sample, and b) experiment with those universes.  One can also

        apply the same logic with a more conventional parametric

        approach, using general knowledge of the sampling distribution of

        the mean, based on Normal distribution theory or previous

        experience with resampling.  We shall not discuss the latter

        method here.

             As with approach 1, we do not make any probability

        statements about where the population mean may be found.  Rather,

        we discuss only what various hypothetical universes might

        produce, and make inferences about the "actual" population's

        characteristics by comparison with those hypothesized universes.

             If we are interested in (say) a 95 percent confidence

        interval, we want to find the distribution on each side of the

        sample mean that would produce a sample with a mean that far away

        only 2.5 percent of the time (2 * .025 = 1 - .95).  A shortcut to

        find these "border distributions" is to plot the sampling

        distribution of the mean at the center of the sample, as in

        Approach 1.  Then find the (say) 2.5 percent cut-offs at each end

        of that distribution.  On the assumption of equal dispersion at

        the two points along the line, we now reproduce the previously-

        plotted distribution with its centroid (mean) at those 2.5

        percent points on the line.  The new distributions will have 2.5

        percent of their areas on the other side of the mean of the

        sample.

             So from the standpoint of Approach 2, the conventional

        sample formula (e. g. Wonnacott and Wonnacott, 1990, p. 5) which

        is centered at the mean can be considered a shortcut to

        estimating the boundary distributions.  We say that the boundary

        is at the point that centers a distribution which has only a

        (say) 2.5 percent chance of producing the observed sample; it is

        that distribution which is the subject of the discussion -  that

        is, one of the distributions at the endpoints of the vertical

        line in Figure II-3-1 - and not the distribution which is

        centered at mu = xbar. [1]

                                 Figure II-3--1

             To restate, then:  moving progressively farther away from

        the sample mean, we can eventually find a universe that has only

        some (any) specified small probability of producing a sample like

        the one observed.  One can then say that this point represents a

        "limit" or "boundary" so that the interval between it and the

        sample mean may be called a confidence interval.

        Interpretation of Approach 2

             Now to interpret the results of the second approach:

        Assuming that the sample is not drawn in a biased fashion (such

        as the wind blowing all the apples in the same direction), and

        assuming that the population has the same dispersion as the

        sample, we can say that distributions centered at the 95 percent

        confidence points (each of them including a tail with 2.5 percent

        of the area), or even further away from the sample mean, will

        produce the observed sample only 5 percent of the time or less.

             The result of the second approach is more in the spirit of a

        hypothesis test than of the usual interpretation of confidence

        intervals.  Another statement of the result of the second

        approach is:   We postulate a given universe - say, a universe at

        (say) the two-tailed 95 percent boundary line.  We then say: The

        probability that the observed sample would be produced by a

        universe with a mean as far (or further) from the observed

        sample's mean as the universe under investigation is only 2.5

        percent.  This is similar to the prob-value interpretation of a

        hypothesis-test framework.  It is not a direct statement about

        the location of the mean of the universe from which the sample

        has been drawn.  But it is certainly reasonable to derive a

        betting-odds interpretation of the statement just above, to wit:

        the chances are 2 1/2 in 100 (or, the odds are 2 1/2 to 97 1/2)

        that a population located here would generate a sample with a

        mean as far away as the observed sample.  And it would seem

        legitimate to proceed to the further betting-odds statement that

        (assuming we have no additional information) the odds are 97 1/2

        to 2 1/2 that the mean of the universe that generated this sample

        is no farther away from the sample mean than the mean of the

        boundary universe under discussion.  About this statement there

        is nothing slippery, and its meaning should not be controversial.

             Here again the tactic for interpreting the statistical

        procedure is to restate the facts of the behavior of the universe

        that we are manipulating and examining at that moment.  We use a

        heuristic device to find a particular distribution - the one that

        is at (say) the 97 1/2  - 2 1/2 percent boundary - and simply

        state explicitly what the distribution tells us implicitly:  The

        probability of this distribution generating the observed sample

        (or a sample even further removed) is 2 1/2 percent.  We could go

        on to say (if it were of interest to us at the moment) that

        because the probability of this universe generating the observed

        sample is as low as it is, we "reject" the "hypothesis" that the

        sample came from a universe this far away or further.  Or in

        other words, we could say that because we would be very surprised

        if the sample were to have come from this universe, we instead

        believe that another hypothesis is true.  The "other" hypothesis

        often is that the universe that generated the sample has a mean

        located at the sample mean or closer to it than the boundary

        universe.

             The behavior of the universe at the 97 1/2 - 2 1/2 percent

        boundary line can also be interpreted in terms of our

        "confidence" about the location of the mean of the universe that

        generated the observed sample.  We can say:  At this boundary

        point lies the end of the region within which we would bet 97 1/2

        to 2 1/2 that the mean of the universe that generated this sample

        lies to the (say) right of it.

             As noted in the preview to this chapter, we do not learn

        about the reliability of sample estimates of the population mean

        (and other parameters) by logical inference from any one

        particular sample to any one particular universe, because in

        principle this cannot be done.  Instead, in this second approach

        we investigate the behavior of various universes at the

        borderline of the neighborhood of the sample, the characteristics

        of those universes being chosen on the basis of their

        resemblances to the sample.  We seek, for example, to find the

        universes that would produce samples with the mean of the

        observed sample less than (say) 5 percent of the time.  In this

        way the estimation of confidence intervals is like all other

        statistical inference:  One investigates the probabilistic

        behavior of hypothesized universes, the hypotheses being

        implicitly suggested by the sample evidence but not logically

        implied by that evidence.

             Approaches 1 and 2 may (if one chooses) be seen as identical

        conceptually as well as (in many cases) computationally.  But as

        I see it, the interpretation of them is rather different, and

        distinguishing them helps one's intuitive understanding.


        Approach 3:  A Simulation Method

             Here is another new method:  We can simulate the behavior of

        a variety of universes at different distances from us.

             As one thinks about the concept of confidence interval, it

        turns out to be either very hard or impossible to get a clear

        idea of what others are talking about or the meaning of the

        mathematical operations they perform in connection with that

        concept - as shown in the quotes from various skeptical

        statisticians above.  To clarify the matter, and also as a

        practical expedient, I propose a way of defining confidence

        intervals - or a concept that grasps some of that idea, which we

        might call an accuracy interval - that has clarified many other

        difficult concepts (e.g. relativity), but so far as I can tell,

        has not been employed with confidence intervals: operational

        definition.

             To use a the physical example of estimating the accuracy of

        the estimate of the location of the trunk of an apple tree to

        illustrate the logic:  We may base our estimate of the spread of

        the fall of apples from apple trees on the actual sample that we

        have, and then examine how often a sample of, say, ten apples

        from such a tree would have a mean as far to the right as we are

        standing.  And this is indeed how we can proceed -- trying out

        simulated trees at differing distances.

             These are the operational steps I suggest that one would

        perform to compute a confidence-like accuracy interval in a

        particular simple case:

             1.  Mark off the narrowest area of the observed sample

        distribution that contains 10 percent of the probability density;

        in the case of a symmetrical distribution, this would be on both

        sides of the mean.  Let us call this area the "target zone".

             2.  Locate points of width similar to the target zone on the

        horizontal axis extending to the left and right of the target

        zone without bound (assuming that the distribution is two-

        dimensional).  At each of these points, including the middle

        point of the target zone, locate a bootstrap universe constructed

        on the model of the observed sample.

             3.  By simulation (or some analytic device), produce from

        the middle (target zone) universe, (say) 100 means of samples

        size n (n being the observed sample size).

             4.  Mark those means that fall within and those outside the

        target zone.

             5.  Repeat steps 3 and 4 for the first such universes to the

        left and right of the sample, and then for other universes to the

        left and right until they are so far away that they are not

        putting any noticeable number of means in the sample zone.

             6.  Count the total means in the sample zone; ignore all

        others.

             7.  Array all these means according to the universes from

        which they came.

             8.  Start at the middle point, and continue outward until

        the universes between the center and that point account for (say)

        95 percent of the means within the sample zone.  Mark that point.

             9.  The marked point constitutes the interval containing 95

        percent of the universes that might have given rise to the

        observed sample.  One can then say that there is a 95 percent

        probability that the observed sample came from a universe with a

        mean within that interval.

             It is all-important for this procedure, of course, that the

        distribution of universes is assumed to be horizontal.  But we

        have not had to make any assumptions about that shapes of the

        universe(s), not even that they (it) be symmetrical.

             This third approach has not yet been developed in practice.

        But the very exercise of thinking it through illuminates the

        issues involved in constructing conventional confidence intervals

        or the boundary intervals described in Approach 2.


                    CONFIDENCE INTERVALS AND BAYESIAN ANALYSIS

             Bayesian thinking can often be valuable in constructing

        confidence intervals.  If one states one's prior beliefs about

        the distribution of the parameter in question, and then combines

        that distribution with the observed data, there is nothing

        mysterious or ambiguous about stating the posterior distribution

        of belief, which can then be considered as the stuff of a

        confidence interval.  Therefore, Bayesian analyis can serve well

        to shine clear sunlight on this murky concept.  And even if one

        wishes to state an extremely "uninformative" prior distribution -

        that is, a state of affairs when one asserts close to no

        knowledge at all - the Bayesian procedure is admirably clear and

        consistent, pulling no rabbits from a hat.  An illustration

        (using data from Box and Tiao) may be found in Chapter 00.

             One need not even do anything differently than standard

        confidence-interval calculations, to get the benefit of Bayesian

        analysis.  One may simply interpret the results in the Bayesian

        fashion so as to obtain meaningful statements.


                                    CONCLUSION

             It is not possible in principle to derive a probability

        statement about the location of the mean or any other parameter

        of a distribution from a set of data alone, without additional

        assumptions.

             One can make unambiguous statements about the probability

        that any specified distribution, at any given distance from the

        mean of a sample, would produce a sample of the observed size

        with a mean located as far or further away from the hypothetical

        universe's mean as is the observed sample's mean.

             With various Bayesian-type assumptions, one can make

        probability statements about the location of the mean of the

        universe that produced the sample.

             One can make a simulation with a linear Bayesian prior

        distribution (or some other prior) that will allow one to make

        probability statements about the location of the mean of the

        universe that produced the sample.

             Whether one wishes to refer to either of the above two

        procedures as "confidence interval" is a matter of choice.

                  AFTERNOTE:  ABOUT THE INFINITE REGRESS PROBLEM

             This afternote expands on an earlier footnote about Savage's

        objection to confidence intervals on the grounds that they

        constitute an infinite regress.

             Even the next level regression in the sequence that Savage

        mentions cannot be an important difficulty in practice.  If one

        cares to do so, one may estimate the accuracy of the confidence

        limits by (in the resampling approach) repeating the overall

        simulation, and observing the variation in the confidence bounds.

        If one does this and looks at the 95 percent bounds around the

        confidence bounds, they are huge - so large as to be without

        meaning in the cases of proportions that Peter Bruce and I have

        looked at.  But this is only surprising until one thinks about

        it; such large variation is inevitable given that the result is

        something like a .052 probability.

             The exploration in the paragraph above leads back to the

        question of why confidence limits tend to focus on the same 95

        percent and 99.7 percent values as found in classical hypothesis

        testing.  Those values were selected long ago for hypothesis

        testing because they seem to be intuitive measures of the

        relevant psychological surprise.  And for purposes other than

        measures of surpise - that is, more directly related to decision-

        making - hypothesis testing now more frequently (and more

        sensibly, in my view) looks at the prob-value result itself.  But

        this more flexible prob-value concept does not fit comfortably

        with confidence intervals.

             When thought through from scratch, perhaps more sensible

        confidence values would be 50 percent, or 75 percent, rather

        thasn 95 percent - which would be closer to the concept

        traditionally used in physical experiments as a rough plus-or-

        minus index of reliability and error.  The 50 percent bounds on

        50 percent confidence limits might then be a meaningful second

        order measure.

             As to further regressions - any sensible person stops being

        concerned with a further order of smalls at some point; one could

        never live through a day without such approximations.  To worry

        about it is to seek impossible perfection.


                                     ENDNOTES


        **FOOTNOTES**

             [1]:Savage is troubled by the infinite regress in connection
        with the estimate of dispersion. "Taking the doctrine literally,
        it evidently leads to endless regression, for an estimate of the
        accuracy of an estimate should presumably be accompanied by an
        estimate of its own accuracy, and so on forever" (p. 257).  But
        if we simply define "accuracy" operationally as the calculations
        in the approaches discussed below, this difficulty disappears.
        Savage might say that I have just defined away the difficulty.
        I'd answer:  Yes indeed.  It is the highest function of
        operational definitions such as this one to get us around logical
        traps and enable us to function with usable tools.
             This issue is discussed further in the Afternote to the
        chapter.


             [1]:  Though the logic of confidence intervals is not
        only subtle but also rests on shakier ground, in my judgment,
        than that of hypothesis testing, there are thoughtful and
        respected statisticians - for example, Thomas Wonnacott - who
        argue that the logic of confidence intervals is better
        grounded and leads less often to error.

        [1]: An example of this sort of interpretation is as follows:

             ... Although on average X-bar is on target, the
             specific sample mean X-bar that we happen to observe is
             almost certain to be a bit high or a bit low.
             Accordingly, if we want to be reasonably confident that
             our inference is correct, we cannot claim that mu is
             precisely equal to the observed X-bar.  Instead, we
             must construct an interval estimate or confidence
             interval of the form:

                       mu = X-bar + sampling error

             The crucial question is:  How wide must this allowance for
             sampling error be?  The answer, of course, will depend on
             how much X-bar fluctuates...

                  Constructing 95% confidence intervals is like
             pitching horseshoes.  In each case there is a fixed
             target, either the population mu or the stake.  We are
             trying to bracket it with some chancy device, either
             the random interval or the horseshoe...

             There are two important ways, however, that confidence
             intervals differ from pitching horseshoes.  First, only
             one confidence interval is customarily constructed.
             Second, the target mu is not visible like a horseshoe
             stake.  Thus, whereas the horseshoe player always knows
             the score (and specifically, whether or not the last
             toss bracketed the stake), the statistician does not.
             He continues to "throw in the dark," without knowing
             whether or not a specific interval estimate has
             bracketed mu.  All he has to go on is the statistical
             theory that assures him that, in the long run, he will
             succeed 95% of the time.  (Wonnacott and Wonnacott,
             1990, p. 258).


             Savage refers to this type of interpretation as follows:

             ...is a sort of fiction; for it will be found that
             whenever its advocates talk of making assertions that
             have high probability, whether in connection with
             testing or estimation, they do not actually make such
             assertions themselves, but endlessly pass the buck,
             saying in effect, "This assertion has arisen according
             to a system that will seldom lead you to make false
             assertions, if you adopt it.  As for myself, I assert
             nothing but the properties of the system."(1972, pp.
             260-261)


          Lee writes at greater length:[where else is quote below?]
             [T]he statement that a 95% confidence interval for an
             unknown parameter ran from -2 to +2 sounded as if the
             parameter lay in that interval with 95% probability and
             yet I was warned that all I could say was that if I
             carried out similar procedures time after time then the
             unknown parameters would lie in the confidence
             intervals I constructed 95% of the time.

                  Subsequently, I discovered that the whole theory
             had been worked out in very considerable detail in such
             books as Lehmann (1959, 1986).  But attempts such as
             those that Lehmann describes to put everything on a
             firm foundation raised even more questions.  (Lee,
             1989, p. vii)




             [1]: Efron and Tibshirani (1993, p. 157) suggest an
        approach that is computationally like Approach 2, but they
        interpret the computation differently and and refer to it as a
        confidence interval.  They also say that the approach applies
        only to a Normal distribution, whereas I see no reason for such
        a restriction.


        [2]: Peter Bruce has convinced me that a goodly number of
        distributions would result in asymetric confidence intervals.
        This can cause considerable complications for the conventional
        formulaic calculations, though resampling handles them nicely.
        The interpretation requires a longer statement than otherwise,
        however.

             [1]: You can consider this one throw as a sample of one,
        with that throw as the mean observation, if the prior discussion
        of sample means would otherwise lead you to question this
        example.


        [1]:  More about this later; it is, as I said earlier, not of
        primary importance in estimating the accuracy of the confidence
        intervals; note, please, that as we talk about the accuracy of
        statements about accuracy, we are moving down the ladder of
        sizes of causes of error.

             [1]:  When working with proportions, the conventional
        method  must obtain these points from prepared ellipses and
        binomial tables, not from the sort of geometric trick used in
        the previous paragraphs, and hence showing the distribution
        centered at xbar = mu is quite misleading.

        **ENDNOTES**

             <1>: Peter Bruce's help in clarifying the ideas in this

        chapter by discussing them with me, along with teaching them

        jointly with me, has been especially great.



        page 2 \statphil Chapter II-3 statconf 4-23-9623