Monday, August 07, 2006

A little math humor

When it comes to sample variance ...

The two estimators only differ slightly as we see, and for larger values of the sample size n the difference is negligible. The second one is an unbiased estimator of the population variance, meaning that in a large number of repetitions its average value tends to the right value of the population variance. The first one may be seen as the variance of the sample considered as a population.

One common source of confusion is that the term sample variance may refer to either the unbiased estimator s2 of the population variance, or to the variance σ2 of the sample viewed as a finite population. Both can be used to estimate the true population variance. Apart from theoretical considerations, it doesn't really matter which one is used, as for small sample sizes both are inaccurate and for large values of n they are practically the same. Intuitively, computing the variance by dividing by n instead of n − 1 seems to underestimate the population variance. This is however not the case because we are using the sample mean \overline{y} as an estimate of the unknown population mean μ, and the raw counts of repeated elements in the sample instead of the unknown true probabilities.

In practice, for large n, the distinction is often a minor one. In the course of statistical measurements, sample sizes so small as to warrant the use of the unbiased variance virtually never occur. In this context Press et al.[1] commented that if the difference between n and n−1 ever matters to you, then you are probably up to no good anyway - e.g., trying to substantiate a questionable hypothesis with marginal data.


http://en.wikipedia.org/wiki/Variance

2 comments:

Genki na Pengin said...

Funny stuff! I had heard a variation on that one (involving a priest, a rabbi and a spatula), but never that exact version!!

Louan said...

This is the first I have heard an avocado referred to as an unbiased estimator.