Phil C. answered 08/11/21
MPH, Biostatistics + 20 years' teaching
We assume a binary/binomial type response variable, e.g., "Yes / No." Luckily, the formula for the confidence interval for a population proportion is simpler than that for a population mean!
The formula for calculating a specific Margin of Error [ which I will call "MOE"] can be rewritten or solved for the sample size "n," or you may have access in your resources to the formula in an already "solved for n" form. Instead of MOE = ... , we want n = ... .
The four formula variables are: the magnitude of the MOE (e.g., 3% or 2%, etc.), the sample size "n," a z-score associated with the confidence level, and the sample proportion, "p-hat." If your crew has a good idea what the true population proportion is (say, 0.40), then all's well: you use that value for p-hat. But what if you have no idea what the true population proportion might be?
The Big Question: if n has to be calculated BEFORE the sample is taken, how are we going to estimate a sample proportion that doesn't exist yet?!
A common approach to this is to "guess" the proportion that would give you the LARGEST standard error for a desired sample size "n." But we don't have to guess, because the expected population proportion that would produce the largest variation in sample proportions is independent of sample size!
Consider: For which of the following "games" would you be most hesitant to bet on the outcome? Another way of saying this: for which of these games are you most uncertain about the outcome?
- Tossing a coin and getting "heads"
- Drawing a card from a full deck and getting the Ace of Spades
- Drawing a Red marble from a jar containing 45 Red marbles and 2 White marbles
The outcomes of playing Games 2 and 3 are more confidently predicted: we bet we will NOT draw the Ace of Spades, or that we WILL draw a Red marble. But Game #1? It's "50-50." as they say. We are the most uncertain when the game could go "either way!"
Applying that idea to the formula works: the true population proportion for which we would be most uncertain to "bet" on the outcome of the next trial is one in which 50% of the subjects support our politician. Hence, the MAXIMUM MOE (which is an expression of uncertainty) can be found by setting the expected value of p-hat = 0.50. This will produce a sample size n that is "conservative" -- i.e., it will work for even the most variable sample spaces. Which implies it will work for all sample spaces that are less uncertain.
Our sample size "n" will be too large for most populations from which we sample, but it will never be too small. :'/
Let me know if this helps,
Phil C.