9. SAMPLING AND STATISTICAL INFERENCE
We often need to know something about a large population. Eg: What is the average number of hours per week devoted to online social networking for all US residents? It’s often infeasible to examine the entire population. Instead, choose a small random sample and use the methods of statistical inference to draw conclusions about the population. But how can any small sample be completely representative?
We can’t act as if statistics based on small samples are exactly representative of the entire population. Why not just use the sample mean x in place of μ? For example, suppose that the average hours for 100 randomlyselected US residents was x = 6.34. Can ...view middle of the document...
• Our conclusions will not always be correct. This problem is inevitable, unless we examine the entire population. • We can, however, control the probability of making an error. If we focus completely on what happened to us in our given sample, without putting it into the context of what might have happened, we can’t do statistical inference. The success of statistical inference depends critically on our ability to understand sampling variability.
The Sampling Distribution of X
Different samples lead to different values of x . But the sample was randomly selected! Therefore, X is a random variable, taking different values depending on chance. So X has its own distribution, called the sampling distribution.
[Sampling Lab Results]
The sampling lab results indicate that the sampling distribution of X is different from the distribution of the population. The sampling distribution has its own mean, variance, and shape, distinct from those of the population. The sampling lab results show that the variance of X based on a sample of size 5 seems to be less than the variance of the population. The average of the x values obtained seems quite close to the population mean. We now give some precise definitions.
A random sample (of size n) from a finite population (of size N) is a sample chosen without replacement so that each of the ⎛ N ⎞ ⎜n⎟ ⎝ ⎠ possible samples is equally likely to be selected. If the population is infinite, or, equivalently, if the sampling is done with replacement, a random sample consists of n observations drawn independently, with replacement, from the population. Hereafter, we assume that either the population is infinite, or else that N is sufficiently large compared to n that we can ignore the effects of having a finite population.
• Statistics (such as the sample mean x ) obtained from random samples can be thought of as random variables, and hence they have distributions, called theoretical sampling distributions. • In order for our inferences to be valid, it is critical that we get a random sample, as defined above. Suppose that a random sample, of size n, is taken from a population having mean μ and standard deviation σ. Although μ and σ are fixed numbers, their values are not known to us.
The Mean and Variance of X
• Even though we will only take one sample in practice, we must remember that the sample was selected by a random mechanism. • Therefore, X is a random variable! Its randomness is induced by the sampling procedure. If we had taken a different random sample, we might have gotten a different value for x . • Since X is a random variable, it must have a distribution. To draw valid inferences, we must take account of this sampling distribution, that is, we must think about all of the values that x might have taken (but didn’t).
• Since all distributions have means and variances, the distribution of X must also have a mean and a variance, denoted by μ x , σ 2 . x These quantities are given by the...