Saturday, March 5, 2011

Student's t, Normality, & the Slutsky Theorems

 Often in an analysis we are using s2 instead of σ2 (as the true population variance is often unknown). Yet we want to construct confidence intervals, or equivalently conduct hypothesis tests. The t distribution is the ratio of a normally distributed variable and chi-square distributed variable ( DeGroot, 2002).  If our data is distributed exactly normal, we can rely on using the t-table for constructing confidence intervals. These are exact results, as the t-ratio is exactly distributed t given that the underlying data is distributed exactly normal. 

But what if we don’t know the distribution of the data we are working with, or don’t feel comfortable making assumptions of normality. Usually we have to estimate σ2 with s2. Of course if n is large enough, reliance on the t-table or the standard normal table will give similar results. But, Goldberger offers the following anecdote:

“ There is no good reason to rely routinely on a t-table rather than a normal table unless Y itself is normally distributed” ( Goldberger, 1991).

So, how do you justify this?   In this case there are some powerful theorems regarding asymptotic properties of sample statistics known as the Slutsky Theorems. These are outlined in Goldberger, 1991. The following sequence of steps using these theorems is based on Goldberger and my lecture notes from ECO 603 Research Methods for Economics, which was actually a mathematical statistics course taught by Dr. Christopher Bollinger at the University of Kentucky. Any errors or mistakes are completely my own.

GivenΘ^ is an estimator for the population parameter Θp  implies convergence in probability, and d implies convergence in distribution:

S1: If Θ^ p Θ  then for any continuous function h (Θ^)→ p h(Θ).

S2: If   Θ1^ and Θ2^  converge in probability to (Θ1, Θ2), then
 h (Θ1^, Θ2^) p  h(Θ1, Θ2).


S3: If Θ^ p Θ  and Zn  d N(0,1)  then (Θ^  +    Znd  N( Θ, 1 )

S4: If Θ^ p Θ  and Zn  d N(0,1)  then Θ^  Zn   d  N( 0, Θ2 )

S5: If n1/2 (Θ^ - Θ ) / s1/2 ~A N(0,1) then for continuous functions of Θ,

        n1/2 (h(Θ^) – h( Θ )) / s1/2 ~A N(0, h’(Θ)2Σ2)

( Goldberger, 1991).

So now if we want to use s2 to estimate  σ2 we form the statistic

Z^ = ( Xbar - μ)2 / (s2 / n)1/2  = n1/2  ( Xbar - μ)/ s    (1)

This looks like the t- statistic, but if we can’t make the assumption of normality, the exact results of the t-distribution do not apply. In this case we rely on results of both the CLT and Slutsky theorems.
Given the traditional  standardized normal variable formulation: 

 Z = n1/2  ( Xbar - μ) σ 

Algebraic manipulation shows that

( σ/ s)  n1/2  ( Xbar - μ)/ σ  = n1/2  ( Xbar - μ)/ s

then   ( σ/s)  Z = Z^  where Z^ is defined in (1) above

By the CLT,  Z d N( 0,1)

It can be shown that  s2 p  σ2

If we define Θ^ as  s then we can view  ( σ/s) as a function h (Θ^)

Then by S1  h (Θ^)→ p h(Θ)  which implies that ( σ/s) p ( σ/ σ) = 1

Given S1S4 gives the following result:  Θ^ Zn   d  N( 0, Θ2 ) which implies that
 ( σ/s)Z d  N( 0, ( σ/ σ)2 ) = N( 0,1)

And therefore Z^ d N(0,1).


Therefore, by the Central Limit and Slutsky theorems (S1 and S4) one can use the asymptotic properties of the statistic Z^ = n1/2  ( Xbar - μ)/s to form confidence intervals based on the standard normal distribution without making any assumptions about the distribution of the sample data and using s2 to estimate   σ2.

How large does n have to be before asymptotic properties apply?  From Kennedy, A Guide to Econometrics 5th Edition:

 "How large does the sample size have to be for estimators to display their asymptotic properties? The answer to this crucial question depends on the characteristics of the problem at hand. Goldfeld and Quandt (1972, p.277) report an example in which a sample size of 30 is sufficiently large and an example in which a sample of 200 is required."

An important note to remember, it is often the case that people say 'as n becomes large the normal distribution approximates the t-distribution', but in fact, as shown above, as n-becomes large the formulation above (Z^) actually approximates the normal distribution (again based on the CLT and the Slutsky theorems).

References:

A Guide to Econometrics, Kennedy 2003.
A Course in Econometrics, Goldberger 1991.
Economics 603 Research Methods and Procedures in Economics Course Notes. University of Kentucky. Taught by Dr. Christopher Bollinger (2002).

No comments:

Post a Comment