Bias of an estimator
From Wikipedia, the free encyclopedia
|
This article is about bias of statistical estimators. For other uses, see Bias (disambiguation).
In statistics, the difference between an estimator's expected value and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased. Although the term bias sounds pejorative, it is not necessarily used in that way in statistics. Biased estimators may have desirable properties. Not only do they sometimes have a smaller mean squared error than any unbiased estimator, but in some cases the only unbiased estimators are not even within the convex hull of the parameter space, so their use is absurd.
DefinitionSuppose we are trying to estimate the parameter Failed to parse (Missing texvc executable; please see math/README to configure.): \theta \ using an estimator Failed to parse (Missing texvc executable; please see math/README to configure.): \widehat{\theta} (that is, some function of the observed data). Then the bias of Failed to parse (Missing texvc executable; please see math/README to configure.): \widehat{\theta} is defined to be
minus the true value Failed to parse (Missing texvc executable; please see math/README to configure.): \theta \ ." This may be rewritten as
is precisely Failed to parse (Missing texvc executable; please see math/README to configure.): \theta \ ). ExamplesEstimating varianceSuppose X1, ..., Xn are independent and identically distributed normal random variables with expectation μ and variance σ2. Let
where N is the population size and xi represents the member of the whole population. Then S2 is a "biased estimator" of σ2 because
Common sense would suggest to apply the population formula to the sample as well. The reason that it is biased is that the sample mean is generally somewhat closer to the observations in the sample than the population mean is, to these observations. This is so because the sample mean is, by definition, in the middle of the sample, while the population mean may even lie outside the sample. So the deviations to the sample mean will often be smaller than the deviations to the population mean, and so, if the same formula is applied to both, then this variance estimate will on average be somewhat smaller in the sample than in the population. Note that when a transformation is applied to an unbiased estimator, the result is not necessarily itself an unbiased estimate of its corresponding population statistic. That is, for a non-linear function f and an unbiased estimator U of a parameter p, f(U) is usually not an unbiased estimator of f(p). For example the square root of the unbiased estimator of the population variance is not an unbiased estimator of the population standard deviation. Bias, however, is not the only consideration when choosing a statistic. Bias refers to the central tendency of the sampling distribution of a statistic, but the variance of the sampling distribution can also be an important consideration. Specifically, statistics with smaller sampling variances will yield greater statistical power. For example, while S2 above is more biased than the traditional sample calculation
Estimating a Poisson probabilityA far more extreme case of a biased estimator being better than any unbiased estimator is well-known: Suppose X has a Poisson distribution with expectation λ. It is desired to estimate
The only function of the data constituting an unbiased estimator is
The (biased) maximum likelihood estimator
. Maximum of a discrete uniform distributionThe bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the maximum-likelihood estimator of n is X, even though the expectation of X is only (n + 1)/2; we can only be certain that n is at least X and is probably more. In this case, the natural unbiased estimator is 2X − 1. See alsoExternal links |


