
Alex K. answered 10/28/19
Expert in high-level math, statistics, finance, economics
a) In order to estimate the proportion of defective items from the production line, we take ∑xi*fi; where i = 0 to 7, xi = # of defects per sample; fi = # of samples with xi defects.
Through simple arithmetic we find ∑xi*fi = 200.
Total number of observations is n * S; where n = # of observations per sample and S = total # of samples. We know from the problem that n =20, S = 100 and therefore total number of observations is 2000.
Therefore, proportion of defective items is ∑xi*fi/ (n * S) = 200/2000 = 10%.
b) In order to solve for r and s, we need to understand the Binomial distribution.
Binomial distribution is appropriate to model any random variable with binary outcome (e.g., success or failure) repeated over successive trials and assumed random trials are independent of each other.
The 100 samples of size 20 seem to fit the criteria for modeling via Binomial distribution, given the outcomes are binary (i.e., either a product is defective or it is not), there are 100 trials, and there is nothing in the problem to indicate the trials are not independent.
The equation for the Binomial distribution function is nCx * px * (1-P)n-x
To solve for r or s, we need to know n, x, and p.
We know from above that n=20, as we have 20 observations from each of the 100 samples; we know from above that p =10% [where p is the probability that any given item is defective.
Given the nature of the problem, r represents the total number of samples in which we’d expect to see 2 defects and s represents the total number of samples in which we’d expect to see 4 defects. Therefore, if you set x = 2, n=20, and p=0.1, then you’ll solve for r, where r represents the total number of samples [out of the 100 total samples] in which we’d expect to see 2 defects, assuming the defects process indeed follows the Binomial distribution. Same is true for estimating s, except you’d change x to 4 while holding all else equal.
c) Given the wording of the problem, we must express our null hypothesis, which is that Binomial (n,p) fits the empirical data well [where n = 20 and p = 0.1].
Once you’ve estimated r and s [which I leave you to do, given it is just arithmetic], you’ll need to create a column of observed values and expected values in order to perform the hypothesis testing requested in this portion of the problem.
The observed values are those given in part (a) of the problem (e.g., 17 samples with 0 defects; 31 samples with 1 defect; etc…) while the expected binomial distribution values are given in part (b) of the problem (e.g., 12.2 samples with 0 defects; 27 samples with 1 defect; etc …). We have 8 distinct intervals, which are 0 defects per sample, 1 defect per sample, . . ., 7 defects per sample. Therefore, for each interval, we have an associated observed value (e.g., 31 samples observed with 1 defect) and an associated expected value (e.g., 27 samples with 1 defect).
Estimate the chi-square value as ∑(Observedi - Expectedi) 2 / (Expectedi) ; where i = 0 to 7.
Keep in mind, chi-square goodness of fit is enormously powerful because it allows us to assess whether the data is well approximated by normal, binomial or Poisson distribution. In this case, our null hypothesis is the observable data is approximated by the Binomial distribution [for all the reasons we already mention in the answer to part (a)]. If unclear, please go back and re-read part (a) which does explain the premise behind why we model the defect probabilities using Binomial.
To that point, find the chi-square table [available via any google search that includes “chi-square table” in the query] and look in the column header of 0.05 and row that corresponds with the degrees of freedom of our test. As an aside, 0.05 represents the 5% level of significance used to test the manager’s claim [as stated in the problem] and the degrees of freedom of our test is 19 [degrees of freedom via chi-square goodness-of-fit equal sample size minus 1].
If the chi-square value we’ve estimated is greater than the corresponding value in the chi-square distribution, we reject the null hypothesis. I’ll leave it to you to estimate the chi-square value and compare that value vs. the chi-square table with 19 degrees of freedom and level of significance equal to 0.05.
d) If we’ve rejected the null hypothesis [again, this is just arithmetic so I leave to you to figure out whether chi-square value is greater or less than corresponding chi-square table [at degrees of freedom of 19 and level of significance of 5%], then it means the Binomial distribution doesn’t fit the observable data well. Recall, Binomial distribution assumes trials are independent of each other, an outcome is either a success or failure, and there are repeatable trials.
The fact is the trials are repeatable and any outcome is either a success [no defect] or failure [defect]; therefore, if the binomial distribution doesn’t fit the data well [assuming we’ve rejected the null hypothesis] means there’s likely not independence between trials. Any lack of independence between trials suggests some systematic problem in the production process which needs to be fixed because the systematic problem is causing more errors in production than should be the case.
In conclusion, if I’m the manager and I reject the null hypothesis using chi-square goodness-of-fit, it means I have every reason to believe there’s a systematic problem in the production process that needs to be resolved.