# Should I use a t- or a Z-Distribution?

We all heard about the simplification to use a t-distribution when sample size is small (n<30) and Z-distribution when n is large (n>30); indeed, with increasing n, the distribution of the sample means will converge to the normal distribution thanks to the CLT. But what truly happens when n is not large enough?

The requirements for a Z-distribution are:
1) The sample mean is normally distributed
AND
2) The population standard deviation σ is known (in order to use σ for our statistic test)

The requirements for a t-distribution are:
1) The sample size is large (n>30)
OR
2) The sample is normally distributed (may require separate normality tests)
OR
3) The population is known to be normally distributed but with standard deviation σ unknown (so we must use s instead).

So really, when the sample size is small (n<30), AND when the population from which the sample is taken is known to be normally distributed with a known standard deviation σ, we CAN use the Z distribution (the sample mean from any sample size taken from a normally distributed will be normally distributed). This satisfies condition 1 and 2 for a Z distribution.
In case of doubt, you should err on the side of caution and use the t-distribution...

So what about proportions? Well, here the key is that the binomial distribution approximates the normal distribution "under certain conditions":
With n= sample size, p= population probability (and q = 1-p), each individual trial output can only assume two values (1/0, TRUE/FALSE, YES/NO): this is a binary output and follows the Bernoulli distribution.

The requirements for using the normal approximation to the binomial distribution are:
1) np>5 AND nq>5
AND
2) p is not too close to 0 nor to 1

When these conditions are not satisfied, we can revert to the binomial distribution (when p = constant, i.e. with replacement) or to the hyper-geometric distribution (when p varies on each trial, i.e. without replacement) to calculate the probability of K events happening out of N trials given an assumed population probability p (additional arguments k,n for the HG distribution needed). \$75p/h

Yves S.

Advanced Excel and Statistics for Professionals and College Students

400+ hours