When conducting a group of statistical comparisons, it is important to understand the concept of familywise error or the 'familywise error rate'. The concept of familywise error describes the probability of incorrectly rejecting the null hypothesis. Perhaps more informally, familywise error represents the odds of reporting a statistical difference when no such difference exists.

To understand why familywise error is important we must revert back to concepts learned in a basic introductory statistics class. First, we generally compare t-values (or other test statistics) to a known distribution of scores (t-scores in this case). Like any distribution, some values near the center of the distribution are more likely to occur than others. By understanding the odds that a given t-score will occur, we are then able to designate values that define regions of the distribution that are unlikely to occur as "critical values". That is, values above or below these critical values only occur with a certain limited probability.

The exact probability we choose to define the critical values is arbitrary and often driven by tradition. As mentioned previously, we want to select values that are unlikely to occur in order to reliably report statistical differences. On the other hand, selecting values that are too stringent might cause us to report that no difference between our samples exists when in fact a difference does in fact exist. In psychology, a alpha of 0.05 is generally designated and denotes that falsely rejecting the null hypothesis should only occur 5% of the time. This value has been shown to provide a nice balance between selectivity and reliability.

Having said this, it is important to realize that the probability of finding a test statistic that is outside of the critical values for a single comparison is 1 in 20 (5%). However, we also know from probability that the more chances one has, the more likely an event becomes. For example, if a player's chances of winning the lottery with one ticket are 1 in 14 million, that player can increase his chances to 1 in 14,000. As you might imagine, this presents a problem when running multiple statistical comparisons. The more statistical tests we run, the more likely it is that we will incorrectly identify a statistical difference (i.e. incorrectly reject the null hypothesis).

There are many ways to correct for this problem. Each different correction has its pros and cons and each correction may be more or less stringent than the next. Thus, one must consider how conservative of a correction a given situation calls for, if a desired correction will be overly liberal or conservative given the number of comparisons being made, and perhaps if the number of comparisons being made even warrants a correction at all.

One common correction that you might use is a Sidak correction. This is a simple correction to calculate using the following formula:

1−(1−α)^(1/n)

where, α represents your chosen alpha level and 'n' represents the number of comparisons being made. This formula returns a corrected p-value that should be utilized in in lieu of alpha for each of your multiple comparisons when deciding whether or not to reject the null hypothesis. For example, if one were to make five separate statistical comparisons, the Sidak correction would dictate that only p-values of < 0.01 should be considered instead of the typical p< 0.05.

Try utilizing the formula yourself to calculate a corrected p-value and see how increasing the number of comparisons necessitates a greater correction.

To understand why familywise error is important we must revert back to concepts learned in a basic introductory statistics class. First, we generally compare t-values (or other test statistics) to a known distribution of scores (t-scores in this case). Like any distribution, some values near the center of the distribution are more likely to occur than others. By understanding the odds that a given t-score will occur, we are then able to designate values that define regions of the distribution that are unlikely to occur as "critical values". That is, values above or below these critical values only occur with a certain limited probability.

The exact probability we choose to define the critical values is arbitrary and often driven by tradition. As mentioned previously, we want to select values that are unlikely to occur in order to reliably report statistical differences. On the other hand, selecting values that are too stringent might cause us to report that no difference between our samples exists when in fact a difference does in fact exist. In psychology, a alpha of 0.05 is generally designated and denotes that falsely rejecting the null hypothesis should only occur 5% of the time. This value has been shown to provide a nice balance between selectivity and reliability.

Having said this, it is important to realize that the probability of finding a test statistic that is outside of the critical values for a single comparison is 1 in 20 (5%). However, we also know from probability that the more chances one has, the more likely an event becomes. For example, if a player's chances of winning the lottery with one ticket are 1 in 14 million, that player can increase his chances to 1 in 14,000. As you might imagine, this presents a problem when running multiple statistical comparisons. The more statistical tests we run, the more likely it is that we will incorrectly identify a statistical difference (i.e. incorrectly reject the null hypothesis).

There are many ways to correct for this problem. Each different correction has its pros and cons and each correction may be more or less stringent than the next. Thus, one must consider how conservative of a correction a given situation calls for, if a desired correction will be overly liberal or conservative given the number of comparisons being made, and perhaps if the number of comparisons being made even warrants a correction at all.

One common correction that you might use is a Sidak correction. This is a simple correction to calculate using the following formula:

1−(1−α)^(1/n)

where, α represents your chosen alpha level and 'n' represents the number of comparisons being made. This formula returns a corrected p-value that should be utilized in in lieu of alpha for each of your multiple comparisons when deciding whether or not to reject the null hypothesis. For example, if one were to make five separate statistical comparisons, the Sidak correction would dictate that only p-values of < 0.01 should be considered instead of the typical p< 0.05.

Try utilizing the formula yourself to calculate a corrected p-value and see how increasing the number of comparisons necessitates a greater correction.