Contents

Sampling from Worksheet

Examples:

Data Size: Different versions of XLMiner™  have varying limits on size of data. The size of data depicted in the example below may not be supported by your version. Refer to Data Handling Specifications for details.

Open the dataset Sampling.xls. It contains a variable ID for the record identification and seven variables, v1, v2, v7, v8, v9, v10, v11.

Let us work out different sampling options on this dataset and see the outputs.

For all the sampling examples, first activate XLMiner -->Data Utilities --> Sample from worksheet. We get the dialog for sampling after that. Make the changes as mentioned in each example and see output.

  1. We will use the default options first. 
  2. Select all the variables. Let us see the output with these default settings.

    • This is a simple random sample without replacement, with default random seed 12345. The desired sample size is 87 records as shown above.

    • Note that XLMiner™ has introduced one more variable, Row ID in the sample. Every dataset may not have a number allocated to each record. The variable, Row ID  gives IDs to records before sampling, then sorts the output on those IDs before displaying it. Row ID proves handy when the user wants to go back to the dataset for crosschecking some information. 
  3. Let us try Sampling with replacement.
    • Select all the variables.

    • Check the box "Sample with Replacement."

    • Choose the desired sample size to be 300. Since we are choosing sampling with replacement, XLMiner™ can very well generate a sample with a larger no. of records than the dataset.

    • Click OK.

    See the output.

    • The output indicates "True" for Sampling with replacement. The desired sample size is greater than #records in input data.

    • The sampling with replacement is apparent in the second and third entries which are the same record, record#3.

  4. Try stratified random sampling as follows.

    • Select all the variables.

    • Let us select the Desired sample size to be 100 and the Set seed 45689.

    • Select Stratified random sampling.

    • Click on the pull down menu for the stratum variable. Choose v8. (Remember, the variable you choose should not have more than 30 distinct values.) The #strata is automatically displayed once you select v8.

    • Select "Proportionate to stratum size."

    • Click "OK".

    • In order to maintain the proportions of the strata intact XLMiner™ has increased the sample size a little on the higher side.  This is apparent  in the entry for #records actually sampled.

    • Under stratum wise details XLMiner™ has listed all the stratum values v8 assumes with #records in input data for each stratum.

    • On this basis XLMiner™ calculated the percentage representation of that value in the dataset and maintains it in the sample.

    • You can see the #records in sampled data having the same proportion as in the dataset.

    • XLMiner™ has attached Row Id to each record before sampling. The output is displayed after sorting the sample on these Row Ids. 

  5. Select the options for stratified random sampling with equal records from each stratum.
    • Select all the variables.

    • Select Stratified random sampling. Choose v8 as the Stratum variable. The #strata is displayed automatically.

    • Select "Equal from each stratum, please specify #records".

    • Enter the #records. Remember, this number should not be greater than the smallest stratum size. In this case the smallest stratum size is 8. (If you already noticed, the smallest stratum size appears automatically in a box as you select the stratum variable). Enter 7 , which is less than the limit of 8.

    • Click OK.

    In the output,

    • The #records in sampled data are 56, ie 7 records per stratum for 8 strata. 
  6. If we want a sample with equal no. of records of each stratum but of bigger size we can use the same options above for sampling with replacement.
    • Check "Sample with replacement". 

    • Specify #records as 20. Though the smallest stratum size is 8 in this dataset, we can get more records as we are sampling with replacement.

    • Choose all the other options just as in ex. 4 above

    The output is as follows

    • The output sample has 20 records per stratum. So the #records in sampling is 160.
  7. Try the following combination of options-
    • Select Stratified random sampling. Choose v8 as the Stratum variable . The #strata is displayed automatically.

    • Select "Equal from each stratum, #records = smallest stratum size".

    • The smallest stratum size is displayed automatically as you select the stratum variable.

    • Click OK.

    The output is quite self explanatory.

    • #records in sampled data = 8 per stratum, the smallest stratum size.

     

     

    See also: