Sponsored Links
-->

Wednesday, February 21, 2018

StatQuest - Sample Size and Effective Sample Size, Clearly ...
src: i.ytimg.com

In statistics, effective sample size is a notion defined for a sample from a distribution when the observations in the sample are correlated or weighted.


Video Effective sample size



Correlated observations

Suppose a sample of several observations y i {\displaystyle y_{i}} is drawn from a distribution with mean ? {\displaystyle \mu } and standard deviation ? {\displaystyle \sigma } . Then the mean of this distribution is estimated by the mean of the sample:

? ^ = 1 n ? i = 1 n y i . {\displaystyle {\hat {\mu }}={\frac {1}{n}}\sum _{i=1}^{n}y_{i}.}

In that case, the variance of ? ^ {\displaystyle {\hat {\mu }}} is given by

Var ( ? ^ ) = ? 2 n {\displaystyle \operatorname {Var} ({\hat {\mu }})={\frac {\sigma ^{2}}{n}}}

However, if the observations in the sample are correlated, then Var ( ? ^ ) {\displaystyle \operatorname {Var} ({\hat {\mu }})} is somewhat higher. For instance, if all observations in the sample are completely correlated ( ? ( i , j ) = 1 {\displaystyle \rho _{(i,j)}=1} ), then Var ( ? ^ ) = ? 2 {\displaystyle \operatorname {Var} ({\hat {\mu }})=\sigma ^{2}} regardless of n {\displaystyle n} .

The effective sample size n eff {\displaystyle n_{\text{eff}}} is the unique value (not necessarily an integer) such that

Var ( ? ^ ) = ? 2 n eff {\displaystyle \operatorname {Var} ({\hat {\mu }})={\frac {\sigma ^{2}}{n_{\text{eff}}}}}

n eff {\displaystyle n_{\text{eff}}} is a function of the correlation between observations in the sample. Suppose that all the correlations are the same and nonnegative, i.e. if i ? j {\displaystyle i\neq j} , then ? ( i , j ) = ? >= 0 {\displaystyle \rho _{(i,j)}=\rho \geq 0} . In that case, if ? = 0 {\displaystyle \rho =0} , then n eff = n {\displaystyle n_{\text{eff}}=n} . Similarly, if ? = 1 {\displaystyle \rho =1} then n eff = 1 {\displaystyle n_{\text{eff}}=1} . More generally,

n eff = n 1 + ( n - 1 ) ? {\displaystyle n_{\text{eff}}={\frac {n}{1+(n-1)\rho }}}

The case where the correlations are not uniform is somewhat more complicated. Note that if the correlation is negative, the effective sample size may be larger than the actual sample size. Similarly, it is possible to construct correlation matrices that have an n eff > n {\displaystyle n_{\text{eff}}>n} even when all correlations are positive. Intuitively, n eff {\displaystyle n_{\text{eff}}} may be thought of as the information content of the observed data.


Maps Effective sample size



Weighted samples

If the data has been weighted, then several observations composing a sample have been pulled from the distribution with effectively 100% correlation with some previous sample. In this case, the effect is known as Kish's Effective Sample Size

n eff = ( ? i = 1 n w i ) 2 ? i = 1 n w i 2 {\displaystyle n_{\text{eff}}={\frac {(\sum _{i=1}^{n}w_{i})^{2}}{\sum _{i=1}^{n}w_{i}^{2}}}}

1 Daniel A. Griffith Ashbel Smith Professor of Geospatial ...
src: images.slideplayer.com


References


Six Values: B: Effective Sample Size - YouTube
src: i.ytimg.com


Further reading

  • M. B., Priestley (1981), Spectral Analysis and Time Series 1, Academic Press , §5.3.

Source of article : Wikipedia