Standard Deviation (1 of 3)
So far, we have introduced two measures of spread; the range (covered by all the data) and the inter-quartile range (IQR), which looks at the range covered by the middle 50% of the distribution. We also noted that the IQR should be paired as a measure of spread with the median as a measure of center. We now move on to another measure of spread, the standard deviation, which quantifies the spread of a distribution in a completely different way.
The idea behind the standard deviation is to quantify the spread of a distribution by measuring how far the observations are from their mean, x. The standard deviation gives the average (or typical ...view middle of the document...
This is always the case, and is the reason why we have to do a more complicated calculation to determine the standard deviation:
3. Square each of the deviations:
The first few are (-2)2 = 4, (0)2 = 0, (-4)2 = 16, and the rest are 16, 36, 4, 36, 0.
4. Average the square deviations by adding them up, and dividing by n - 1, (one less than the sample size):
(4+0+16+16+36+4+36+0) = 112 = 16
* the reason why we "sort of" average the square deviations (divide by n - 1) rather than take the actual average (divide by n) is beyond the scope of the course at this point, but will be addressed later.
* This average of the squared deviations is called the variance of the data.
5. The SD of the data is the square root of the variance: SD = √16 = 4
* Why do we take the square root? Note that 16 is an average of the squared deviations, and therefore has different units of measurement. In this case 16 is measured in "squared customers," which obviously cannot be interpreted. We therefore take the square root in order to compensate for the fact that we squared our deviations, and in order to go back to the original unit of measurement.
Recall that the average number of customers who enter the store in an hour is 9. The interpretation of SD = 4 is that on average, the actual number of customers that enter the store each hour is 4 away from 9.
The importance of the numerical figure that we found in #4 above called the variance (=16 in our example) will be discussed much later in the course when we get to the inference part.
Properties of the Standard Deviation
1. It should be clear from the discussion thus far that the SD should be paired as a measure of spread with the mean as a measure of center.
2. Note that the only way, mathematically, in which the SD = 0, is when all the observations have the same value (Ex: 5, 5, 5, ... , 5), in which case, the deviations from the mean (which is also 5) are all 0. This is intuitive, since if all the data points have the same value, we have no variability (spread) in the data, and expect the measure of spread (like the SD) to be 0. Indeed, in this case, not only is the SD equal to 0, but the range and the IQR are also equal to 0. Do you understand why?
3. Like the mean, the SD is strongly influenced by outliers in the data. Consider the example concerning video store customers: 3, 5, 7, 9, 9, 11, 13, 15 (data ordered). If the largest observation was wrongly recorded as 150, then the average would jump up to x = 25.9 , and the standard deviation would jump up to SD = 50.3. Note that in this simple example, it is easy to see that while the standard deviation is strongly influenced by outliers, the IQR is not. The IQR would be the same in both cases, since, like the median, the calculation of the quartiles depends only on the order of the data rather than the actual values.