Yves S. answered 10/30/19
Statistics made easy for undergrad, grad and MBA students
A confidence interval is a range below and above a stated mean (population, sample or assumed), where we are confident most of the data will be present. In this case, we are not sure about the population mean and we want to establish its lower and upper limits based on a single sample, with the sample mean = 1,144.54mg at a 99% level of certainty (or alpha = 0.01).
The Confidence Interval is given by the formula: CI = sample mean +/- MoE (Margin of Error).
The Confidence Interval is closely related to hypothesis testing (and would be if were asked the question whether the label is accurate or not - see end of section).
Margin of Error can be broken into two parts: 1) Critical Value and 2) Standard Error.
with the formula: MoE = CV * SE:
- The critical value (CV) is a function of 1) the level of certainty (or alpha) required, and 2) to a certain extent on the sample size. The more accurate you want to be (more inclusive %), the larger the CV will be (I need a wider range to sweep more of the data around me). The CV is calculated from either a Z-distribution or a t-distribution (see below). So the CV will be either Z-crit (alpha) or t-crit (alpha,df). In our case we choose a t-distribution (see below) with t-crit= T.INV.2T(0.01,59) = 2.6618
- The Standard Error (SE) is the amount of variation in what is measured; you can probably already see that the Standard Error is a function is how many data points we collect (sample size). The larger sample size, the smaller the error (makes sense since we get more chances of being accurate upon measuring more data points). But the Standard Error is also a function of how naturally the data varies (regardless of sample size), this is the Standard Deviation. The formula for the Standard Error is given by: SE = S/SQRT(n), with S being the sample standard deviation and n the sample size. Our standard Error formula is SE = 272.13/SQRT(60) = 35.1318
You can now compute your Margin of Error, MoE = t-crit * SE = 2.6618 * 35.1318 = 93.51mg
You can now state that the mean of the actual spicy sauce lies between :
Lower Limit (LL) = sample mean - MoE = 1144.54 - 93.51 = 1,051.02mg
Upper Limit (UL) = sample mean +MoE = 1144.54 +93.51 = 1,238.05mg
(with a 99% confidence level).
Given your label states 1,150mg, the label is considered accurate and you would not reject the null hypothesis H(0) that the mean is any different, if you were ever asked the question.
- Choosing a Z or a t-test (from a normal or a t-distribution):
- When testing for the mean and standard deviation: a Z-test is appropriate when the sample size is large (n>30) AND when the population standard deviation (sigma) is known; a t-test is appropriate in all cases when n<30, OR when a) the sample standard deviation is known AND when b) the sample or the population normality has been established. In this case, we cannot determine whether the sample data is normally distributed (even if your professor wants to assume it, it would have to be explicitly written as an assumption); fortunately the sample size is large enough. In case of doubt, always perform a t-test (you can't go wrong); at best, the t-test will approximate the Z-test for large samples.
- When testing for proportions (p): always use the Z-distribution; this is because the distribution of a proportion (percentage of Yes/No, or defects, or other such parameter) is binary, and follows the binomial distribution; the binomial distribution approximates the normal distribution under certain circumstances (n large, but more specifically np>5 and nq>5). it does not follow the t-distribution.
- In our case, you want to err on the safe side and choose a t-test. your t-crit value can be calculated in Excel as: t-crit = T.INV.2T(alpha, df). Df or degrees of freedom in a one-sample t-test is calculated as df = n-1 = 60-1 = 59. Note we use a two-tail t-value because we want to capture the data above and below the mean; (so really we are talking about 0.5% above and 0.5% below). In other hypothesis testing, you may be asked for upper or lower range only, so a one-tail (left or right) may be appropriate.
- Hypothesis testing: when asked a question whether the sample falls within a certain known interval or whether a difference is significant or not, we make a hypothesis that the sample tested is no different from the mean (in other words, nothing has changed and you can go on with your life).
- this is noted as:
- null hypothesis = H(0) = nothing changed = the sample statistic is within expectations (or the estimated population mean is)
- alternative hypothesis = H(a) = something changed = the sample statistic is NOT with expectations.
- In our exercise, we could state that H(0) = actual mean = 1,150mg, and H(a) = actual mean <> 1,150mg
- From the sample test, we could not identify that the mean is any different from 1,150mg since the confidence interval includes that value (at a 99% confidence level).