What is the difference between N and N-1 in calculating population variance?

I did not get the why there are N and N-1 while calculating population variance. When we use N and when we use N-1?

enter image description here
Click here for a larger version

It says that when population is very big there is no difference between N and N-1 but it does not tell why is there N-1 at the beginning.

Edit: Please don’t confuse with n and n-1 which are used in estimating.

Edit2: I’m not talking about population estimation.

5 Responses to “What is the difference between N and N-1 in calculating population variance?”

  1. ttnphns says:

    Instead of going into maths I’ll try to put it in plain words. If you have the whole population at your disposal then its variance (population variance) is computed with the denominator N. Likewise, if you have only sample and want to compute this sample variance, you use denominator N. In both cases, note, you don’t estimate anything: the mean that you measured is the true mean and the variance you computed from that mean is the true variance.

    Now, you have only sample and want to infer about the unknown mean and variance in the population. In other words, you want estimates. You take your sample mean for the estimate of population mean (because your sample is representative), OK. To obtain estimate of population variance, you have to pretend that that mean is really population mean and therefore it is not dependent on your sample anymore since when you computed it. To “show” that you now take it as fixed you reserve one (any) observation from your sample to “support” the mean’s value: whatever your sample might have happened, one reserved observation could always bring the mean to the value that you’ve got and which believe is insensitive to sampling contingencies. One reserved observation is “-1″ and so you have N-1 in computing variance estimate.

    Imagine that you somehow know the true population mean, but want to estimate variance from the sample. Then you will substitute that true mean into the formula for variance and apply denominator N: no “-1″ is needed here since you know the true mean, you didn’t estimate it from this same sample.

  2. Michael Lew says:

    The population variance is the sum of the squared deviations of all of the values in the population divided by the number of values in the population. When we are estimating the variance of a population from a sample, though, we encounter the problem that the deviations of the sample values from the mean of the sample are, on average, a little less than the deviations of those sample values from the (unknown) true population mean. That results in a variance calculated from the sample being a little less than the true population variance. Using an n-1 divisor instead of n corrects for that underestimation.

  3. Nick Cox says:

    You could have a better feeling about this question when playing with octave or matlab…

    x = rand(10,1);
    var1 = sum((x - mean(x)).^2) / (length(x));
    var2 = sum((x - mean(x)).^2) / (length(x)-1);

    you will verify a significant difference between var1 and var2, since your sample size is very small. Repeat it by considering a larger population size.

    x = rand(1e6,1);
    var1 = sum((x - mean(x)).^2) / (length(x));
    var2 = sum((x - mean(x)).^2) / (length(x)-1);

    you will verify that var1 $approx$ var2

Leave a Reply

Question and Answer is proudly powered by WordPress.
Theme "The Fundamentals of Graphic Design" by Arjuna
Icons by FamFamFam