
James C. answered 09/25/19
Research Methodologist and Statistician (from UC Merced)
Hi Kateryna,
For ANOVA, you can leave the variables coded as you like. For instance, let's call your first variable -- the binary or dichotomous one -- as "x1". The values for x1 can be coded as "0" and "1", or "1" and "2". It really does not matter. Let's call the second variable "x2". The values for "x2" could be whatever you like -- "0", "1" and "2"; or "1", "2", "3".
For multiple regression, you would have to set up the data slightly differently. Values for "x1" should be coded as "0" and "1". In particular, due to the appearance of an intercept in the design matrix (i.e., a column filled with "1"s), you will need to set up two variables for "x2". These variables are called dummy codes. Let's call them "x2dum1" and "x2dum2". You will need to code the values for each of the dummy variables as 0 and 1. Let me show you how the setup would differ in the data file by example:
ANOVA:
id | x1 | x2 |
1 | 1 | 1 |
2 | 2 | 2 |
3 | 2 | 2 |
4 | 1 | 3 |
5 | 1 | 3 |
6 | 1 | 3 |
... | ... | ... |
N | x1n | x2n |
Regression:
id | x1 | x2dum1 | x2dum2 |
1 | 0 | 0 | 0 |
2 | 1 | 1 | 0 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 0 | 0 | 1 |
6 | 0 | 0 | 1 |
... | ... | ... | ... |
N | x1n | x2dum1n | x2dum2n |
The regression equation would look like this. For i = 1,...,N respondents:
yi = αi + x1i * β1 + x2dum1i * β2 + x2dum2i * β3 + εi,
where y denotes your outcome variable, α denotes your intercept, β1 is the coefficient for x1, β2 is the coefficient for x2dum1, β3 is the coefficientε for x2dum2, and ε represents the error term.
So what happened to the third category for your second categorical variable in the regression model? Simply put, it is absorbed into the intercept.
So your interpretation would be such that when there are values of zero for both x2dum1 and x2dum2, that would correspond to your first category ("1"), as depicted in the ANOVA data setup. Values coded as zero in regression always represent the reference group, or the group with which you want to make a comparison.
(We could discuss why ANOVA does not require dummy coding but regression does. However, that it is an entirely different conversation for another time.)
James