# Understanding the Standard Deviation

Many of my tutoring clients first come to me because they need help with the standard deviation. In truth, the standard deviation is computationally complicated but conceptually very simple. It is simply the average amount of difference between a score in the mean in any given set of scores (called a "distribution").

That's all a "deviation" is: the difference between a score and the mean of the distribution of which it's part. Thus, for instance, the deviation of a score of 2 from the mean of 5 is -3, while the deviation of 8 from the mean of 5 is (positive) 3. If we subtract the mean from every score in a distribution (so as to find the deviations), add up these deviation scores, and divide by the number of scores, we should have the "average deviation." But this is where it gets computationally complicated. We can't just add up the deviations directly because the sum of the deviations from the mean is always 0 -- this is what makes the mean such a good measure of central tendency! If the sum of the deviations is always 0, we can't calculate the average deviation directly, because the average would be equal to the sum of the scores (0) divided by the number of scores, and 0 divided by anything is always 0.

To get around this limitation, we need to find a way to make the deviations add up to something other than 0. One useful way to do this is to square them (that is, multiply each deviation by itself). Because a squared number is always positive, this makes all the deviations positive in value, meaning they will always add up to a positive, non-zero number (except in cases where every score is equal to the mean, in which case there is no deviation to measure!).

If we square the deviations and then add them up, we get what's called the sum of squares (literally the sum of the squared deviations from the mean). We can then divide the sum of squares by the number of scores to get an "average" -- but the average we get is the average squared deviation (this is called the variance). Because we squared the deviations to get to this point, we now need to un-square them. By taking the square root of the variance, we get the standard deviation.

The standard deviation, then, is equal to the average difference between a score and the mean, but we calculate it by taking the square root of the average squared difference between the scores and the mean.

Here's a quick example. Suppose I have a distribution of five scores: 1, 2, 3, 4, and 5. The mean of the distribution is equal to the sum of the scores (1 + 2 + 3 + 4 + 5 = 15) divided by the number of scores (5), so (15 divided by 5 = ) 3.

We can then subtract the mean (3) from every score in the distribution to find the deviations: (1 - 3) = -2; (2 - 3) = -1; (3 - 3) = 0; (4 - 3) = 1; (5 - 3) = 2. So our deviations are -2, -1, 0, 1, and 2. As you can see, these add up to zero: the negative deviations cancel out the positive ones. So we can't calculate an average deviation using them; because they add up to 0, the "average" will always be 0 (as 0 divided by anything is 0).

If we square the deviations, though, they add up to something positive, because a squared number can't be negative. If we square them, we get: (-2^2) = 4; (-1^2) = 1; (0^2) = 0; (1^2) = 1; (2^2) = 2. So our squared deviations are 4, 1, 0, 1, and 4. These add up to (4 + 1 + 0 + 1 + 4 = ) 10.

So our sum of squares is 10. If we divide this by the number of scores (5), we get the variance (the average squared deviation): in this case, (10 divided by 5 = ) 2. We can then undo the earlier step we took of squaring the deviations by taking the square root of ("un-squaring") the variance. The square root of 2 is approximately 1.41. So the average difference between the scores of our distribution and the mean of the distribution is 1.41.

Not so bad, huh? Subtract the mean from every score in the distribution to get the deviations; square the deviations; add them up; divide by the number of scores; then take the square root of the result.

Now, the above applies only to what we might call the "descriptive standard deviation." By that, I mean that it's accurate only for descriptive purposes, for telling us something about the distribution we have in front of us. Oftentimes, however, that's not the reason we're calculating the standard deviation at all. More often than that (in research settings, for instance), we're calculating the standard deviation using sample data in order to estimate the variability in the population. When that's the case, the "descriptive standard deviation" isn't going to work as well: it's what we call a biased estimator. It will systematically underestimate the amount of variability in the population. The reason for this is intuitive: samples are always smaller than populations (indeed, we prefer to study samples because they're smaller and easier to manage than populations), so samples always have less opportunity to vary than the population does.

When we want to calculate the standard deviation for inferential purposes, then, it's better to use the "inferential" formula. This means applying a very simple correction called Bessel's correction to our variance estimate. Bessel's correction is a simple, arbitrary upward adjustment proportionate to the sample size, and is applied by multiplying the variance by n/(n-1), where n is the sample size. In the above example, n/(n-1) = 5/(5-1) = 5/4. This is a correction of 125%, meaning we're increasing our variance estimate by 25% to account for the fact that we're generalizing the sample variance to the population. If we multiply the variance we got above (2) by 5/4 or 1.25, we get 2.5, a larger variance estimate. We can then take the square root of it to get an inferential standard deviation of approximately 1.58. Notice that's a lot higher than our descriptive standard deviation of 1.41. Again, that stands to reason: we're adjusting our estimate of the population variability upwards to account for the fact that our sample is less variable than the population.

Notice that the size of Bessel's correction shrinks as the sample size grows -- which also stands to reason, because as our sample size grows, we're capturing more of the variability in the population and so we have less need of a correction. When the sample size was 5, our correction was 25%. When the sample size is 20, the correction is equal to 20 divided by 19, so about 5.26%. When the sample size is 50, the correction is 2.04%. When the sample size is 200, the correction is a mere 0.5%. And so on.

We can also build Bessel's correction directly into how we calculate the standard deviation, by dividing the sum of squares by the sample size minus one (n-1) rather than the sample size (n). In our above example, we had a sum of squares of 10. If we divide 10 by one less than the number of scores, or (5 -1 = ) 4, we get a variance of 2.5 -- directly equal to the estimate we got by multiplying the (descriptively-calculated) variance by Bessel's correction.

Most statistics programs (e.g., SPSS) calculate the inferential standard deviation and don't give you the option of calculating the descriptive version. In Excel, you can calculate the descriptive standard deviation using the formula =STDEVP( ) and the inferential standard deviation using the formula =STDEV( ), where the cells containing the data you want to calculate the standard deviation for are specified inside the parentheses. You can calculate the variance a similar way, with =VAR.P( ) and =VAR( ) corresponding to the uncorrected (descriptive) and corrected (inferential) variances, respectively.