Matthew B. answered 04/29/14
Tutor
4.9
(281)
Ph.D. | Statistics | Data Science | Python | R
Can't help with the second question as there is no data given. As for the first, I will quickly gloss over the null and research hypothesis.
The research (or alternative) hypothesis is that males and females will differ in their likelihoods of indicating the food is acceptable. I will leave it to you to phrase the null hypothesis, but just remember it has to do with there being no relationship or effect.
There are a couple of ways to analyze these data, the most common would probably be a chi-square test of independence. This is appropriate when the relationship you are testing is between two categorical variables: gender (male vs. female) and acceptability of food (yes vs. no).
To find your test statistic you can enter the data into an appropriate statistical software and you will get χ2 = 1.66, p = .198. Although it doesn't show up very well that is meant to be a chi squared symbol above, not just an "x".
To do this by hand you would use the following equation:
∑[(fo - fe)2/fe]
Where fo is the frequency observed in each subcategory (males who find the food acceptable, males who don't find the food acceptable, etc.) and fe is the expected cell count based on the combined overall categories (sometimes called the marginal counts or frequencies).
The question is how to come up with the expected counts... well in this case it helps to start with the data matrix
Acceptable Not acceptable Total
Females 34 56 90
Males 56 64 120
Total 90 120 210
(Now this is just an artifact of the particular data, the row and column totals won't always match, but they will always add up to the same total, which will represent the total number of cases or observations in the data)
To get the expected counts you multiply the marginal row total by the marginal column total and divide by the overall total. It would look something like this:
Acceptable Not acceptable Total
Females 34 56 90
Females 34 56 90
(90x90)/210 (90x120)/210
= 38.57 = 51.43
Males 56 64 120
(120X90)/210 (120x120)/210
= 51.43 = 68.57
Total 90 120 210
Now it is possible to apply the formula from above. It is the sums of the frequency observed minus the frequency expected, squared, and then divided by the frequency expected for each cell in the matrix above. so from top left across
(34 - 38.57)2/38.57 + (56 - 51.43)2/51.43 + (56 - 51.43)2/51.43 + (64 - 68.57)2/68.57 = ??
Well I can let you do the actual work there, but the answer you will get is the same as I got using my statistics software... 1.66. So that is the good news. I guess my statistics software works, which is always nice to verify from time to time.
From a terminology standpoint, I am not one-hundred percent sure, but I think the p-hat question has to do with expected probabilities (which was done above as part of finding the chi-square statistic). Usually, the term "hat" refers to predicted values and when it comes to categorical data that can often be translated to "expected" frequencies.
Now for the cutoff value, you just need to know the degrees of freedom for the chi-square test. This is found using the following formula based on the table from above: (rows - 1) x (columns -1), which in this case would lead get you to df = 1 x 1 = 1.
As a fun side note, the critical value at the alpha .05 level is 3.84, which is the same value you would get if you squared the t-test cutoff at alpha = .05 for a sufficiently large sample (tcrit = 1.96). There is a technical mathematical reason that is the case, but I won't digress into that at this stage. Just wanted to point out the connection.
I will leave the conclusion in your hands. You now know how to get the test statistic, you have the cutoff value (which can be found using an appropriate table based on the chi-square distribution), and you even have the p-value from much earlier when I ran the stats using my software.