Sampling from Database
|
Introduction to Sampling A statistician often comes across huge volume of information from which he wants to draw inferences. Time and cost limitations make it impossible to go through every entry in these enormous datasets. This is when the statisticians resort to sampling techniques. They choose a sample of the dataset and use it for the statistical procedures. Let us just revise a few statistical terms. The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions. A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample. Let us discuss a few here. Simple Random Sampling : This is probably the simplest method for obtaining a good sample. A simple random sample of say, size n, is chosen from the population in such a way that every random set of n items from the population has an equal chance of being chosen as sample. Thus simple random sampling not only avoids bias in the choice of individual item but also gives every possible sample an equal chance. The Data Sampling utility of XLMiner™ offers the user the freedom to choose sample size , seed for randomization, sampling with or without replacement while doing simple random sampling. Stratified Random Sampling : In this technique the population is first divided into groups of similar items. These groups are called strata. Each stratum in turn is sampled using simple random sampling. These samples are then combined to form a stratified random sample. The Data Sampling utility of XLMiner™ offers the user the freedom to choose sorting seed for randomization and sampling with or without replacement when doing stratified random sampling. The desired sample size can be prefixed by the user depending on which method is being chosen for stratified random sampling. Thus, using this utility the user can get a good sample of the dataset as per his specifications.
See also: |