It is often examples that make ideas understandable to students and current events can be a good source of examples. Case in point. Today in Wisconsin, the issue of the day is the outcome of the recall elections and problems with the exit polling. As a tutor, the outcome isn’t interesting, but exit polling like all surveys is key to the usefulness of statistics! In fact, it gives a great opportunity to illustrate some of the basic (and non-mathematical) ideas and concepts of statistics — usually the ideas presented at the beginning of most introduction-to-statistics courses.
Statistical inferences are grounded in some basic definitions and assumptions (in bold). A population is a defined collection of individuals that we want to know some data about and a sample is a group taken from the population that we are going to actually collect data from (Sullivan, 2010, p. 5; Triola, 2010, p. 4). If we wanted to know the actual data about a population, which is called a parameter, we would need to undertake a census “the collection of data from every member of the population.” (Triola, 2010, p. 4). If the population is large, this would be a daunting and expensive task. Instead, statistics can be used (1) to select a sample of the population that is representative of the population, (2) to collect data about the sample which is summarized in a statistic, and (3) to determine how to extrapolate the population’s parameter from the sample’s statistic. (Sullivan, 2010; Triola, 2010).
But often people (including individuals who should know better) make a critical error by focusing only on the second step because it seems to be what we want to find out, usually as quickly as possible. So for the example of the day, we want to know how people voted so we know the result right away. It seems to make sense that the easiest way to find the result would be use an exit poll, i.e. ask a subset of the voters (a sample) how they voted and calculate a summary of how they voted (a statistic) and use that statistic to estimate the parameter how all the voters (the population) voted. But over-focusing on the second step, creates real problems, because sample selection is crucial either to ensure the sample is representative of the population (step 1) or to ensure the mathematical reliability of extrapolating the parameter from the statistic (step 3).
For exit polls in particular, steps 1 and 3 are difficult. First, representativeness can involve a catch-22: we need to know what the population is like to pick a good sample, but we want to pick a good sample to find out what the population is like. This issue often appears in the press as individuals complaining about the make-up of the survey, e.g. complaining that the sample included too many individuals reporting a particular political affiliation. But it is possible to substitute a random sample for a truly representative sample, and utilize mathematical techniques to show that a statistic from a random sample would be sufficiently reliable (using step 3). Second, unfortunately human beings are really bad at true “randomness” and thus our samples are usually much less random than we think. For something like an exit poll, people volunteer to answer, and so even if a surveyor were to “randomly” select voters as they leave the polls, the only data collected would be about people who “conveniently” choose to answer the questions. Additionally, other factors can interfere with randomness such as when people vote or whether they vote in person or absentee. So, the extent our sample deviates from randomness or representativeness diminishes the reliability of the statistic to estimate the parameter.
To be fair, most statisticians understand these problems, but when the statistics are presented for wider circulation, these problems are ignored, de-emphasized, or misunderstood. Clarifying these issues is often what the first chapter of a good statistics class is all about, so that future users of statistics can be aware of these potential problems. Exit polls in particular are particularly vulnerable to have their problems exposed because they are always followed by an actual census. After all, the vote counting is what we are trying to predict using the exit poll, so that an estimate of that count would be less accurate isn’t surprising, and all the more so given the difficulties of randomness and representativeness.
Sullivan, Michael (2010). Statistics: Informed Decisions Using Data (3rd ed.). Prentice Hall: Upper Saddle River, NJ.
Triola, Mario (2010). Elementary Statistics (11th ed.). Addison-Wesley: Boston, MA