Picking the Right Statistical Test

In all my years of tutoring statistics, probably the most common problem I'm asked about is picking the right statistical test. There's a wide array to choose from, and in many cases statistical tests are equivalent, so here's a few pointers on how to narrow down your options.

First, you have to decide what your research hypothesis is -- that is, what kind of question it is that you're trying to answer. There are two broad categories: hypotheses of difference and hypotheses of association. As you might guess, the former type of hypothesis is concerned with looking for a significant difference between two or more groups or variables, or a significant divergence from some kind of model, while the latter is looking for significant relationship between two or more groups or variables or a significant degree of adherence to some kind of model. Oftentimes, the same research question can be phrased different ways: for instance, "is there a difference in IQ between groups X and Y" would generate a hypothesis of difference, but "can I use any significant differences in IQ to predict membership in groups X and Y" would generate a hypothesis of association. The former would therefore use something like an ANOVA and the latter something like a regression, but the significance test results would be identical.

Hypotheses of difference are tested using z-score tests, t-tests, analyses of variance and covariance (ANOVAs and ANCOVAs), and certain nonparametric tests, like the chi-squared test of independence. Hypotheses of association are tested using measures like correlation, regression, and structural equation modeling (where you are testing to see if a specified causal model fits the data reasonably well).

Next, ask yourself what kind of data you're dealing with. When your data is exclusively ordinal or nominal (or some mix thereof), you're confined to using nonparametric tests like the chi-squared tests, the Mann-Whitney U, or the binomial test. When at least one of your variables is interval or ratio, you may be able to use a parametric test like the t-test or an ANOVA. If you're looking at two interval or ratio variables, you're probably looking at the association between them, so regression or correlation would be the way to go.

Then, consider how many variables (and levels of variables) you have. Some tests, like the t-test, only work if you have just two groups. Others, like the ANOVA family, work well with three or more groups (or levels of the independent variable). Likewise, regular correlation works well only with two variables; with three or more variables, multiple correlation or regression is needed.

Finally, consider the nature of what you're trying to do. If you know there's a third variable the influence of which you'd like to control for, you may need to bring in extra firepower, for instance, doing an ANCOVA or hierarchical regression, than you would if you were only looking at the straightforward association of or difference between two variables. Conversely, if you're trying to establish the validity of a psychometric measure, something like factor analysis or structural equation modeling (both of which combine elements of hypotheses of difference and association) is needed.

There are plenty of statistical test "decision trees" available online, where, given that you know what your variables are like, you can land on a particular choice for a test to perform. But if you have doubts, you can always consult a knowledgeable statistics tutor.


Sean W.

Talented Mathematics Tutor

100+ hours
if (isMyPost) { }