CHAPTER II-4

                CONFIDENCE INTERVALS II: PROCEDURE WITH EXAMPLES

             Here is a checklist for the canonical procedure for

        confidence intervals.  It follows much the same logic as

        presented for testing hypotheses in an earlier chapter. We shall

        begin with the binomial example of a political poll, and then

        present the "continuous" multi-valued example of tree heights.


        The Accuracy of Political Polls

             Consider the reliability of a randomly-selected 1988

        presidential election poll, showing 840 intended votes for Bush

        and 660 intended votes for Dukakis out of 1500 (Wonnacott and

        Wonnacott, 1990, p. 5)).  Let us work through the logic of this

        example.

             What is the question?  Stated technically, what are the 95%

        confidence limits for the proportion of Bush supporters in the

        population?  (The proportion is the mean of a binomial population

        or sample, of course.)  More broadly, within which bounds could

        one confidently believe that the population proportion was likely

        to lie?  At this stage of the work, we must have already

        translated the conceptual question (in this case, a decision-

        making question from the point of view of the candidates) into a

        statistical question.  (See Chapter II-1 on translating questions

        into statistical form.)

             What is the purpose to be served by answering this question?

        There is no sharp and clear answer in this case.  The goal could

        be to satisfy public curiosity, or strategy planning for a

        candidate (though a national proportion is not as helpful for

        planning strategy as state data would be).

             Is this a "probability" or a "probability-statistics"

        question?  The latter; we wish to infer from sample to population

        rather than the converse.

             Given that this is a statistics question: What is the form

        of the statistics question - confidence limits or hypothesis

        testing?  Confidence limits.

             Given that the question is about confidence limits:  What is

        the description of the sample that has been observed?  a) The raw

        sample data - the observed numbers of interviewees are 840 for

        Bush and 660 for Dukakis - constitutes the best description of

        the universe.  The statistics of the sample are the given

        proportions - 56 percent for Bush, 44 percent for Dukakis.

             Which universe? (Assuming that the observed sample is

        representative of the universe from which it is drawn, what is

        your best guess about the properties of the universe about whose

        parameter you wish to make statements?  The best guess is that

        the population proportion is the sample proportion - that is, the

        population contains 56 percent Bush votes, 44 percent Dukakis

        votes.  Possibilities for Bayesian analysis?  Not in this case,

        unless you believe that the sample was biased somehow.

             Which parameter(s) do you wish to make statements about?

        Mean, median, standard deviation, range, interquartile range,

        other?  We wish to estimate the proportion in favor of Bush (or

        Dukakis).

             Which symbols for the observed entities?  Perhaps 56 green

        and 44 yellow balls, if an urn is used, or "0" and "1" if the

        computer is used.

             Discrete or continuous distribution?  In principle,

        discrete.  (All distributions must be discrete in practice.)

             What values or ranges of values?  0-1.

             Finite or infinite?  Infinite - the sample is small relative

        to the population.

             If the universe is what you guess it to be, what variation

        among which samples do you wish to estimate?  A sample the same

        size as the observed poll.

             Here one may continue either with resampling or with the

        conventional method.  Everything done up to now would be the same

        whether continuing with resampling or with a standard parametric

        test.

        Conventional Calculational Methods

             Estimating the Distribution of Differences Between Sample

        and Population Means With the Normal Distribution.  In the

        conventional approach, one could in principle work from first

        principles with lists and sample space, but that would surely be

        too cumbersome.  One could work with binomial proportions, but

        this problem has too big a sample for tree-drawing and quincunx

        techniques;  even the ordinary textbook table of binomial

        coefficients is too small for this job.  Calculating binomial

        coefficients also is a big job.  So instead one would use the

        Normal approximation to the binomial formula.

             (Note to the non-statistician:  The distribution of means

        that we manipulate has the Normal shape because of the operation

        of the Law of Large Numbers.  Sums and averages, when the sample

        is reasonably large, take on this shape even if the underlying

        distribution is not Normal.  This is a truly astonishing property

        of randomly-drawn samples - the distribution of their means

        quickly comes to resemble a "Normal" distribution, no matter the

        shape of the underlying distribution.  We then standardize it

        with the standard deviation or other device? so that we can state

        the probability distribution of the sampling error of the mean

        for any sample of reasonable size.)

             (The exercise of creating the Normal shape empirically is

        simply a generalization of particular cases such as we will later

        create here for the poll by resampling simulation.  One can also

        go one step further and use the formula of de Moivre-Laplace-

        Gauss to describe the empirical distributions, and instead of

        them.  Looking ahead now, the difference between resampling and

        the conventional approach can be said to be that in the

        conventional approach we simply plot the Gaussian distribution

        very carefully, and use a formula instead of the empirical

        histograms, afterwards putting the results in a standardized

        table so that we can read them quickly without having to re-

        create the curve each time we use it.  More about the nature of

        the Normal distribution may be found in Chapter 00 [Statphil].)

             All the work done above uses the information specified

        previously - the sample size of 1500, the drawing with

        replacement, the observed proportion as the criterion.

        Confidence Intervals Empirically - With Resampling

             Estimating the Distribution of Differences Between Sample

        and Population Means By Resampling

             What procedure to produce entities?  Random selection from
        urn or computer.

             Simple (single step) or complex (multiple "if" drawings)?

        Simple.

             What procedure to produce re-samples? That is, with or

        without replacement?  With replacement.

             Number of drawings observations in actual sample, and hence,

        number of drawings in resamples? 1500.

             What to record as result of each re-sample drawing?  Mean,

        median, or whatever of re-sample?  The proportion is what we

        seek.

             Stating the distribution of results:  The distribution of

        proportions for the trial samples.

             Choice of confidence bounds?: 95%, two tails (choice made by

        the textbook that posed the problem).

             Computation of probabilities within chosen bounds:  Read the

        probabilistic result from the histogram of results.


             Because the theory of confidence intervals is so abstract

        (even with the resampling method of computation), let us now walk

        through this resampling demonstration slowly, using the

        conventional Approach 1 described previously.  We first produce a

        sample, and then see how the process works in reverse to estimate

        the reliability of the sample, using the Bush-Dukakis poll as an

        example.  The computer program and output may be found in Chapter

        00 Howteach

             Step 1:  Draw a sample of 1500 voters from a universe that,

        based on the observed sample, is 56 percent for Bush, 44 percent

        for Dukakis.  The first such sample produced by the computer

        happens to be 53 percent for Bush; it might have been 58 percent,

        or 55 percent, or very rarely, 49 percent for Bush.

             Step 2: Repeat step 1 perhaps 400 or 1000 times.

             Step 3:  Estimate the distribution of means (proportions) of

        samples of size 1500 drawn from this 56-44 percent Bush-Dukakis

        universe; the resampling result is shown in Figure II-4-1

                                  Figure II-4-1

             Step 4: In a fashion is similar to what was done in steps 1-

        3, now compute the 95 percent confidence intervals for some other

        postulated universe mean - say 53% for Bush, 47% for Dukakis.

        This step produces a confidence interval that is not centered on

        the sample mean and the estimated universe mean, and hence it

        shows the independence of the our procedure from that magnitude.

        And we now compare the breadth of the mean estimated confidence

        intervals for the 5 and 95 percentiles generaqted with the 53-47

        percent universe against the corresponding distribution of sample

        means generated by the "true" Bush-Dukakis population of 56

        percent - 44 percent.  If the procedure works well, the results

        of the two procedures should be similar.

             Now we interpret the results using this first approach.  The

        histogram shows the probability that the difference between the

        sample mean and the population mean - the error in the sample

        result - will be (say) 4 percentage points too low.  It follows

        that about 47.5 percent (half of 95 percent) of the time, a

        sample like this one will be between the population mean and 4

        percent too low.  We do not know the actual population mean.  But

        for any observed sample like this one, we can say that there is a

        47.5 percent that the distance between it and the mean of the

        population that generated it is minus four percent or less.

             Now a crucial step:  We turn around the statement just

        above, and say that there is an 47.5 percent chance that the

        population mean is less than four percentage points higher than

        the mean of a sample drawn like this one, but at or above the

        sample mean.  (And we do the same for the other side of the

        sample mean.)

             So to recapitulate:  We observe a sample and its mean.  We

        estimate the error by experimenting with one or more universes in

        that neighborhood, and we then give the probability that the

        population mean is within that margin of error from the sample

        mean.

             We can also use Approach 2, which is computationally simply

        a short-circuiting of Approach 1 (though the interpretations

        differ), as follows:

             Step 1:  As above.

             Step 2:  With a hypothetical distribution that is 56 percent

        for Bush (the sample estimate) (and in a non-binomial case, with

        the dispersion estimated from the sample) generate perhaps 400

        samples of size 1500.

             Step 3:  Find the 95th percentile of the samples in Step 2.

             Step 4:  Centered at that 95th percentile, generate a

        distribution of samples of size 1500 with the population

        dispersion assumed the same as in step 2.

             Step 5: Find the boundary which includes 95 percent of the

        samples.  If this boundary is indeed the sample mean, then the

        point at which this distribution is centered is indeed the 95

        percent confidence interval (as it must be as long as the

        dispersion used in all of the universes is the same; they are

        just set off from each other algebraically.)

        Approach 2 for Counted Data:  the Bush-Dukakis Poll

             Let's implement Approach 2 for counted data, using for

        comparison the Bush-Dukakis poll data discussed earlier in the

        context of Approach 1.

             We seek to state, for universes that we select on the basis

        that their results will interest us, the probability that they

        (or it, for a particular universe) would produce a sample as far

        or farther away from the mean of the universe in question as the

        mean of the observed sample - 56 percent for Bush.  The most

        interesting universe is that which produces such a sample only

        about 5 percent of the time, simply because of the correspondence

        of this value to a conventional break-point in statistical

        inference.  So we could experiment with various universes by

        trial and error to find this universe.

             We can learn from our previous simulations of the Bush-

        Dukakis poll in Approach 1 that about 95 percent of the samples

        fall within .035 on either side of the sample mean (which we had

        been implicitly assuming is the location of the population mean).

        If we assume (and there seems no reason not to) that the

        dispersions of the universes we experiment with are the same, we

        will find (by symmetry) that the universe we seek is centered on

        those points .035 away from .56, or .535 and .585.

             From the standpoint of Approach 2, then, the conventional

        sample formula that is centered at the mean can be considered a

        shortcut to estimating the boundary distributions.  We say that

        the boundary is at the point that centers a distribution which

        has only a (say) 2.5 percent chance of producing the observed

        sample; it is that distribution which is the subject of the

        discussion -  that is, one of the distributions at the endpoints

        of the vertical line in Figure II-3-1 - and not the distribution

        which is centered at mu = xbar. [1]

             The results of these simulations are shown in Figure II-4-2.

                                  Figure II-4-2

             About these distribution centered at .535 and .585 - or more

        importantly for understanding an election situation, the universe

        centered at .535 - one can say:  Even if the "true" value is as

        low as 53.5 percent for Bush, there is only a 2 1/2 percent

        chance that a sample as high as 56 percent pro-Bush would be

        observed. (The values of a 2 1/2 percent probability and a 2 1/2

        percent difference between 56 percent and 53.5 percent are

        seemingly related arithmetically only by chance in this case.)

        It would be even more revealing in an election situation to make

        a similar statement about the universe located at 50-50, but this

        would bring us almost entirely within the intellectual ambit of

        hypothesis testing.             The demonstrations above using both Approaches 1 and 2 shed

        light on the logic of interpretation of confidence intervals.  We

        have no basis in the work so far to say that there is a 95

        percent chance that the confidence intervals computed from a

        particular sample captures the universe mean, or to make any

        other such statement about the universe mean.  Even so, unless

        you have reason to believe that the probabilities of some

        universe means are very different than others in the neighborhood

        of the sample mean -  which would seem to be a safe assumption in

        the case with the presidential poll - then it would seem

        reasonable to make betting odds that there is a 95 percent chance

        that the confidence intervals computed from a particular sample

        captures the universe mean.  If so, there would seem nothing

        objectionable in this "naive" interpretation for a particular

        sample.


        Samples Whose Observations May Have More Than Two Values

             So far we have discussed samples and universes that we can

        characterize as proportions of elements which can have only one

        of two characteristics - green or red, 1 or 0.  Now let us

        consider observations that can be characterized by a wider

        variety of numbers; these cases are both simpler and more complex

        than proportional universes.  These are problems with

        "continuous" (really multi-valued) data instead of the two-value

        election poll problem above. The binomial case has a deceptively

        easy appearance; in many ways the present problem is easier to do

        than most. (Incidentally, in contrast to the Bush-Dukakis poll

        example above, the 1992 U. S. presidential election was not

        binomial but trinomial, and therefore it is a much more difficult

        problem to deal with.)

             A collection that contains only two sorts of elements (say,

        green and red chips) can be characterized by just the proportion

        (and the total number of elements).  But a collection of (say)

        prices of farms sold in province Z in year t would be

        characterized by the numbers sold at each of many prices (and the

        total number of sales).  In the latter case, we notice at least

        two characteristics:  a) some sort of average, and b) the extent

        to which the elements are spread out (and there may be yet other

        characteristics that interest us).  The inferences that we make

        about the dispersion of such a collection are another important

        part of statistical inference, interesting both for the

        information in itself and for the light it throws on the

        certainty of our other inferences.

             Consider, for instance, that we have just the one sale price

        of 13Q.  We could estimate that the distribution is centered

        around 13Q, but we have no idea whether they are all 13Q, or

        whether the other prices tend to be far from 13.  What if there

        are only two sale prices - 13Q and 15Q, but we have no other

        information, not even the meaning of a Q unit?  What can we

        reasonably say about the distribution, given that we have been

        assured that the two observations are a representative sample of

        prices?

             We might immediately guess that half of the population is

        within, and half outside of, 13Q and 15Q.  But what shape should

        we guess for the distribution?  Should it be horizontal?  Shaped

        like a Normal curve?  Skewed to the right?  Here we have no

        recourse but to use some additional experience and perhaps

        theory.

             If we have some additional observations - say 10 more - we

        could estimate the dispersion of the population, perhaps

        calculating a standard deviation.  That would give some guidance

        even without assuming a shape for the distribution.

             If we had some reason to assume that the distribution is

        shaped Normally - say, if it arose from observations of a planet,

        and the scatter could be assumed to be due to "error" - we could

        immediately do the sort of inference that led to the Normal

        distribution two centuries ago.  If one of the observations is

        quite far from the others - an apparent "outlier" - we could

        calculate its probability if it is part of the same distribution,

        using the standard deviation or other measure of the

        distribution's dispersion.  This would throw some light on

        whether it probably was generated by the same universe as were

        the other observations.


        Approach 1 for Measured Data Example:  Estimating Tree Diameters

             What is the question?  A horticulturist is experimenting

        with a new type of tree.  She plants 20 of them on a plot of

        land, and measures their trunk diameter after two years.  She

        wants to establish a 90% confidence interval for the population

        average trunk diameter.  For the data given below, calculate the

        mean of the sample and calculate (or describe a simulation

        procedure for calculating) a 90% confidence interval around the

        mean.  Here are the 20 diameters (in no particular order):

        8.5   7.6   9.3   5.5   11.4   6.9   6.5   12.9   8.7   4.8

        4.2   8.1   6.5   5.8   6.7   2.4   11.1   7.1   8.8   7.2

             What is the purpose to be served by answering the question?

        Either Research & Development, or pure science

             Is this a "probability" or a "statistics" question?

        Statistics.

             What is the form of the statistics question?  Confidence

        limits.

             What is the description of the sample that has been

        observed? The raw data as shown above.

             Statistics of the sample?  Mean of the tree data.

             Which universe?  Assuming that the observed sample is

        representative of the universe from which it is drawn, what is

        your best guess about the properties of the universe whose

        parameter you wish to make statements about?  Answer: That the

        universe is like the sample above, containing the numbers

        8.5...7.2 the population of trees that will grow with this new

        type, as best estimated by the observations in the sample.  (Are

        there possibilities for Bayesian analysis?)  No Bayesian prior

        information will be included.

             Which parameter do you wish to make statements about?  The

        mean.

             Which symbols for the observed entities?  Cards or computer

        entries with numbers 8.5...7.2, sample of an infinite size.

             If the universe is as guessed at, the variation among which

        samples do you wish to estimate?  Samples of size 20.

             Here one may continue with conventional method.  Everything

        up to now is the same whether continuing with resampling or with

        standard parametric test. The information listed above is the

        basis for a conventional test.

             Use perhaps a t test.  Calculate the standard deviation, and

        apply to t (Show the Normal first).  Read the number of degrees

        of freedom from the sample size above.  Show the formula for mu

        +- 2 s. d.

        Continuing with resampling

             What procedure will be used to produce the trial entities?

        Random selection.  Simple (single step), not complex (multiple

        "if") sample drawings).

             What procedure to produce re-samples?  With replacement.

             Number of drawings?  20 trees

             What to record as result of re-sample drawing?  The mean.

             How to state the distribution of results?  See histogram.

             Choice of confidence bounds: 90% ?,  two-tailed

             Computation of probabilities within chosen bounds  Read from

        histogram.


        Approach 2 for Measured Data:  The Diameters of Trees

             To implement Approach 2 for measured data, one may proceed

        exactly as with Approach 1 above except that the output of the

        simulation with the sample mean as midpoint will be used for

        guidance about where to locate trial universes for Approach 2.

        Working from the histogram in Figure II-3-?, we try universes

        located at 53.8 and 58.2.  The results are shown in Figure II-4-

        3.

                                  Figure II-4-3

        Interpretation of Approach 2             Now to interpret the results of the second approach:

        Assuming that the sample is not drawn in a biased fashion (such

        as the wind blowing all the apples in the same direction), and

        assuming that the population has the same dispersion as the

        sample, we can say that distributions centered at the 95 percent

        confidence points (each of them including a tail with 2.5 percent

        of the area), or even further away from the sample mean, will

        produce the observed sample only 5 percent of the time or less.

             The result of the second approach is more in the spirit of a

        hypothesis test than of the usual interpretation of confidence

        intervals.  Another statement of the result of the second

        approach is:   We postulate a given universe - say, a universe at

        (say) the two-tailed 95 percent boundary line.  We then say: The

        probability that the observed sample would be produced by a

        universe with a mean as far (or further) from the observed

        sample's mean as the universe under investigation is only 2.5

        percent.  This is similar to the prob-value interpretation of a

        hypothesis-test framework.  It is not a direct statement about

        the location of the mean of the universe from which the sample

        has been drawn.  But it is certainly reasonable to derive a

        betting-odds interpretation of the statement just above, to wit:

        the chances are 2 1/2 in 100 (or, the odds are 2 1/2 to 97 1/2)

        that a population located here would generate a sample with a

        mean as far away as the observed sample.  And it would seem

        legitimate to proceed to the further betting-odds statement that

        (assuming we have no additional information) the odds are 97 1/2

        to 2 1/2 that the mean of the universe that generated this sample

        is no farther away from the sample mean than the mean of the

        boundary universe under discussion.  About this statement there

        is nothing slippery, and its meaning should not be controversial.

             Here again the tactic for interpreting the statistical

        procedure is to restate the facts of the behavior of the universe

        that we are manipulating and examining at that moment.  We use a

        heuristic device to find a particular distribution - the one that

        is at (say) the 97 1/2  - 2 1/2 percent boundary - and simply

        state explicitly what the distribution tells us implicitly:  The

        probability of this distribution generating the observed sample

        (or a sample even further removed) is 2 1/2 percent.  We could go

        on to say (if it were of interest to us at the moment) that

        because the probability of this universe generating the observed

        sample is as low as it is, we "reject" the "hypothesis" that the

        sample came from a universe this far away or further.  Or in

        other words, we could say that because we would be very surprised

        if the sample were to have come from this universe, we instead

        believe that another hypothesis is true.  The "other" hypothesis

        often is that the universe that generated the sample has a mean

        located at the sample mean or closer to it than the boundary

        universe.

             The behavior of the universe at the 97 1/2 - 2 1/2 percent

        boundary line can also be interpreted in terms of our

        "confidence" about the location of the mean of the universe that

        generated the observed sample.  We can say:  At this boundary

        point lies the end of the region within which we would bet 97 1/2

        to 2 1/2 that the mean of the universe that generated this sample

        lies to the (say) right of it.             As noted in the preview to this chapter, we do not learn

        about the reliability of sample estimates of the population mean

        (and other parameters) by logical inference from any one

        particular sample to any one particular universe, because in

        principle this cannot be done.  Instead, in this second approach

        we investigate the behavior of various universes at the

        borderline of the neighborhood of the sample, the characteristics

        of those universes being chosen on the basis of their

        resemblances to the sample.  We seek, for example, to find the

        universes that would produce samples with the mean of the

        observed sample less than (say) 5 percent of the time.  In this

        way the estimation of confidence intervals is like all other

        statistical inference:  One investigates the probabilistic

        behavior of hypothesized universes, the hypotheses being

        implicitly suggested by the sample evidence but not logically

        implied by that evidence.

             Approaches 1 and 2 may (if one chooses) be seen as identical

        conceptually as well as (in many cases) computationally.  But as

        I see it, the interpretation of them is rather different, and

        distinguishing them helps one's intuitive understanding.



                 THE PROBLEM OF UNCERTAINTY ABOUT THE DISPERSION

             The inescapable difficulty of estimating the amount of

        dispersion in the population has greatly exercised statisticians

        over the years.  Hence I must try to clarify the matter.  Yet in

        practice this issue turns out not to be the likely source of much

        error even if one is somewhat wrong about the extent of

        dispersion, and therefore we should not let it be a stumbling

        block in the way of our producing estimates of the accuracy of

        samples in estimating population parameters.

             Student's t test was designed to get around the problem of

        the lack of knowledge of the population dispersion.  But Wallis

        and Roberts wrote about the t test: "[F]ar-reaching as have been

        the consequences of the t distribution for technical statistics,

        in elementary applications it does not differ enough from the

        normal distribution... to justify giving beginners this added

        complexity" (1956, p. x).  "Although Student's t and the F ratio

        are explained... the student ... is advised not ordinarily to use

        them himself but to use the shortcut methods... These, being non-

        parametric and involving simpler computations, are more nearly

        foolproof in the hands of the beginner - and, ordinarily, only a

        little less powerful" (p. xi).<1>

             If we knew the population parameter - the proportion, in the

        case we will discuss - we could easily determine how inaccurate

        the sample proportion is likely to be.  If, for example, we

        wanted to know about the likely inaccuracy of the proportion of a

        sample of 100 voters drawn from a population of a million that is

        60% Democratic, we could simply simulate drawing (say) 200

        samples of 100 voters from such a universe, and examine the

        average inaccuracy of the 200 sample proportions.

             But in fact we do not know the characteristics of the actual

        universe.  Rather, the nature of the actual universe is what we

        seek to learn about.  Of course, if the amount of variation among

        samples were the same no matter what the Republican-Democrat

        proportions in the universe, the issue would still be simple,

        because we could then estimate the average inaccuracy of the

        sample proportion for any universe and then assume that it would

        hold for our universe.  But it is reasonable to suppose that the

        amount of variation among samples will be different for different

        Democrat-Republican proportions in the universe.

             Let us first see why the amount of variation among samples

        drawn from a given universe is different with different relative

        proportions of the events in the universe.  Consider a universe

        of 999,999 Democrats and one Republican.  Most samples of 100

        taken from this universe will contain 100 Democrats.  A few (and

        only a very very few) samples will contain 99 Democrats and one

        Republican.  So the biggest possible difference between the

        sample proportion and the population proportion (99.9999%) is

        less than one percent (for the very few samples of 99%

        Democrats).  And most of the time the difference will only be the

        tiny difference between a sample of 100 Democrats (sample

        proportion = 100%), and the population proportion of 99.9999%.

             Compare the above to the possible difference between a

        sample of 100 from a universe of half a million Republicans and

        half a million Democrats.  At worst a sample could be off by as

        much as 50% (if it got zero Republicans or zero Democrats), and

        at best it is unlikely to get exactly 50 of each.  So it will

        almost always be off by 1% or more.

             It seems, therefore, intuitively reasonable (and in fact it

        is true) that the likely difference between a sample proportion

        and the population proportion is greatest with a 50%-50%

        universe, least with a 0%-100% universe, and somewhere in between

        for probabilities between 50% and the endpoints, in the fashion

        of Figure II-4-4.

                                 Figure II-4-4
             Though one commonly estimates the variation of sample means

        (sample sizes the same as the observed sample) for proportions in

        the neighborhood of the estimate population mean - which implies

        a population dispersion (s. d.) appropriate for that

        neighborhood, one could also use a more "conservative" estimate

        of dispersion;  Mosteller et. al. (1970) suggest that if you work

        with the largest possible amount of variation (for example, the

        value at .5 in the case of a problem involving a proportion), you

        ensure that you cannot obtain too small a confidence interval by

        underestimating the variation.  (Here again we see the role of

        judgment, as discussed in Chapter 00)

             Perhaps it will help to clarify the issue of estimating

        dispersion if we consider this:  between an estimate for a second

        sample based on a) the population, or on b) the first sample, the

        former will be more accurate than the latter, because of the

        sampling variation in the first sample that affects the latter

        estimate.  But we cannot estimate that sampling variation without

        knowing more about the population.


              ARGUMENTS ABOUT INTERPRETATION OF CONFIDENCE INTERVALS

             Discussions of confidence intervals often assert that one

        cannot make a probability statement about where the population

        mean may be, but one can make statements about the probability

        that a set of samples may bound it.  For example:
             ... Although on average X-bar is on target, the

             specific sample mean X-bar that we happen to observe is

             almost certain to be a bit high or a bit low.

             Accordingly, if we want to be reasonably confident that

             our inference is correct, we cannot claim that mu is

             precisely equal to the observed X-bar.  Instead, we

             must construct an interval estimate or confidence

             interval of the form:

                       mu = X-bar + sampling error

             The crucial question is:  How wide must this allowance for

             sampling error be?  The answer, of course, will depend on

             how much X-bar fluctuates...

             Constructing 95% confidence intervals is like pitching

             horseshoes.  In each case there is a fixed target, either

             the population mu or the stake.  We are trying to bracket it

             with some chancy device, either the random interval or the

             horseshoe.  This analogy is illustrated in Figure 8-3.

             There are two important ways, however, that confidence

             intervals differ from pitching horseshoes.  First, only

             one confidence interval is customarily constructed.

             Second, the target mu is not visible like a horseshoe

             stake.  Thus, whereas the horseshoe player always knows

             the score (and specifically, whether or not the last

             toss bracketed the stake), the statistician does not.

             He continues to "throw in the dark," without knowing

             whether or not a specific interval estimate has

             bracketed mu.  All he has to go on is the statistical

             theory that assures him that, in the long run, he will

             succeed 95% of the time.  (Wonnacott and Wonnacott,

             1990, p. 258).

        This criticism does not seem to me to fit approach 1 above.  The

        criticism apparently stems from objections by the frequentists.

        But if one takes the operational-definition point of view (see

        Chapter 00), and if we agree that our interest is upcoming events

        and probably decision-making, then we obviously are interested in

        putting betting odds on the location of the population mean (and

        subsequent samples).  A statement about process will not help us

        with that, but only a probability statement.

             Notice that in the earlier discussion it was never necessary

        to use the notion of the "true" population mean that such writers

        as Wonnacott and Wonnacott employ (see their appendix).  As

        discussed in Chapter 00, the notion of a "true parameter" tends

        to confuse the issue, and is out of keeping Einstein's device of

        the operational definition.  Rather than having in mind some

        "true" value, we should instead ask:  "What will happen if I...",

        or "...if I again..."

             Bayesians, too, complain of the process point of view.

        Savage writes that the process

             ...is a sort of fiction; for it will be found that

             whenever its advocates talk of making assertions that

             have high probability, whether in connection with

             testing or estimation, they do not actually make such

             assertions themselves, but endlessly pass the buck,

             saying in effect, "This assertion has arisen according

             to a system that will seldom lead you to make false

             assertions, if you adopt it.  As for myself, I assert

             nothing but the properties of the system."(1972, pp.

             260-261)

          Lee writes at greater length:[where else is quote below?]             [T]he statement that a 95% confidence interval for an

             unknown parameter ran from -2 to +2 sounded as if the

             parameter lay in that interval with 95% probability and

             yet I was warned that all I could say was that if I

             carried out similar procedures time after time then the

             unknown parameters would lie in the confidence

             intervals I constructed 95% of the time.

                  Subsequently, I discovered that the whole theory

             had been worked out in very considerable detail in such

             books as Lehmann (1959, 1986).  But attempts such as

             those that Lehmann describes to put everything on a

             firm foundation raised even more questions.  (Lee,

             1989, p. vii)


                     NOTES ON THE USE OF CONFIDENCE INTERVALS

             1.  Confidence intervals are used more frequently in the

        physical sciences - indeed, the concept was developed for use in

        astronomy - than in bio-statistics and in the social sciences; in

        these latter fields, measurement is less often the main problem

        and the distinction between hypotheses often is difficult.

             2.  Some statisticians suggest that one can do hypothesis

        tests with the confidence-interval concept.  But that seems to me

        equivalent to suggesting that one can get from New York to

        Chicago by flying first to Los Angeles.  Additionally, the logic

        of hypothesis tests is much clearer than the logic of confidence

        intervals, and it corresponds to our intuitions so much more

        easily.

             3.  Discussions of confidence intervals sometimes assert

        that one cannot make a probability statement about where the

        population mean may be, yet can make statements about the

        probability that a particular set of samples may bound that

        mean.

             If one takes the operational-definition point of view (see

        discussion of that concept in connection with the concept of

        probability), and we agree that our interest is upcoming events

        and probably decision-making, then we obviously are interested in

        putting betting odds on the location of the population mean (and

        subsequent samples).  And a statement about process will not help

        us with that, but only a probability statement.

             Moving progressively farther away from the sample mean, we

        can find a universe that has only some (any) specified small

        probability of producing a sample like the one observed.  One can

        say that this point represents a "limit" or "boundary" between

        which and the sample mean may be called a confidence interval, I

        suppose.


                                     SUMMARY

             Let's summarize what one can and cannot assert about

        confidence intervals:

             1.  One can always state the probability that a given

        population S will produce a given sample s (or more precisely, a

        sample with a given mean xbar, or other parameter).  This is a

        straightforward deduction which can be performed either

        theoretically with formal probability theory or with a Monte

        Carlo resampling technique. Indeed, such statements are the core

        of all statistics problems; all the rest of statistics is

        interpretation.

             2.  Derived from (1) above, one can state the relative

        probabilities, and the ratio of them, of the probabilities of two

        given S's producing a given s.

             3.  One cannot ever estimate the probability that a

        particular sample came from any particular population - or even

        put probabilistic bounds (confidence limits) around its mean - on

        the basis of sample evidence alone.  This is the issue of

        induction that mathematicians and philosophers have been

        struggling with for more than two centuries, and undoubtedly

        before that, too.  Even if one knows the mean of a population

        that would produce the observed sample (or a sample even further

        away) only (say) 5 percent of the time, one cannot say anything

        about the probability that that particular population produced

        the observed sample based on only the sample evidence.  The

        probability of any given population depends on probabilities of

        other populations.

             To see that this is so, postulate that we have been told

        that a given sample of green and red balls was produced by either

        one of two universes - A with a proportion of X green balls, and

        B with a proportion of Y green balls - and it is equally likely

        which urn the sample was drawn from.  Assume we are able to state

        (using Bayes reasoning) that it is twice as likely that the

        sample came from urn A as urn B.  If we take urn A as our

        reference, it is clear that if alternative urn B had a different

        proportion than Y as stated above, our conclusion would be

        different than twice as likely.  This demonstrates that without

        some other assumption about the alternatives to any stated

        population, no meaningful probability statement could be made

        about the probability that a sample came from that universe.

             Here again I repeat the crucial distinction between

        discussing the probability that a sample could come from a given

        universe, and the probability that a sample came from a given

        universe.  The former is straightforward, as in (1) above; the

        latter cannot be stated meaningfully without additional

        assumptions.  Not distinguishing between these two statements may

        be at the heart of most muddles about the fundamentals of

        statistics.

             With the first approach described in this chapter, we can

        sensibly say something about the probability that the mean of the

        population that produced a particular sample is within some

        distance of the sample mean, or that a particular population has

        only an X percent chance of producing a sample like this one.

        Those statements are entirely different from speaking about the

        probability that the sample came from a given population.

             With the second approach described in this chapter, one can

        say that the confidence interval includes all the means of

        populations that have a greater than 5 percent chance of

        producing the observed sample.  This crucial statement may be

        cumbersome, but it is logically airtight.   On the other hand

        this does not imply - so far as I can now see - anything about

        the mean of the population from which this sample actually came -

        or more precisely, the population that produced this sample.

             The oft-denounced statement that the confidence interval

        includes the population mean, or that the population mean lies

        within those bounds, with probability of (say) 95 percent is

        loose but not too bad if we include implicit assumptions about

        non-bias and about the dispersion of the population and the

        sample. Or, as some would prefer, this procedure will lead to

        those points bracketing the population mean 95 percent of the

        time you do this sort of thing.  Such statements probably are not

        very inaccurate, given that the world around us is well-behaved

        in such respects most of the time (see Chapter I-1).  And such

        statements should be generally acceptable.  But they are not

        logically implied.  Nor can any of this be proven empirically in

        any way, so far as I know. (It might be tested on assumptions of

        equality of dispersion along the continuum, and assuming a

        continuum of some sort.  But this may not be a profitable avenue

        of thought.)


                                     ENDNOTES        **FOOTNOTES**

        [1]:  When working with proportions, the conventional method
        must obtain these points from prepared ellipses and binomial
        tables, not from the sort of geometric trick used in the previous
        paragraphs.  Hence showing the distribution centered at xbar =
        mu, as in the conventional approach, is quite misleading.
        [out???:  There seems to me to be no basis for this, either.
        After all, a single sample may be regarded as n samples of size
        one.  Why should one be able to draw different sorts of
        conclusions from a set of samples of size one than for the
        evidence in all those samples aggregated into a single large
        sample?  The principle is the same.

        **ENDNOTES**

             <1>: They go on to say, "Techniques and details, beyond a

        comparatively small range of fairly basic methods, are likely to

        do more harm than good in the hands of beginners...The great

        ideas...are lost...nonparametric [methods] involving simpler

        computations, are more nearly foolproof in the hands of the

        beginner" (1956, viii, xi).  Their stance is very much in

        contrast to that of Fisher, who wrote somewhere about the t test

        as a "revolution"..