
Lenny D. answered 05/03/19
Former professor of economics at Tufts University
Dummy variables can either b dependent variables or independent variable. With dependent variables it is usually trying to estimate a logistic function. For instance you may want to determine the probability that an event will happen. This is very common with insurance companies where they look at the probability that a prospective client may get into an accident base on a a variety of factor like age, past driving record, sex and a few other things the dependent variable would be 1 for the got into an accident and 0 if they did not.
Dummy variables are used frequently as independent variables. Suppose you are building a model to explain income in a cross-section of the population. you have information on age, education years experience. or variables . You have a hypothesis that men have a higher starting salary and a faster trajectory ( a glass ceiling for women). If we let Z = all of the other explanatory variables and and estimate
y = a +a1dummy +b1dummy*experience +Zc + u dummy = 1 if male 0 otherwise.. a1 would measure the change in the intercept of the expected income difference from being born with a y chromosome. the coefficient b1 would measure how much faster or slower you could expect to advance if you are a male. The coefficients will be good estimate of the impact of Gender on earnings IF the rest of your equation is properly specified. Dummy variables are used when we don't have a perfect or complete model. If the rest of the model is not properly specified your estimates of a1 and b1 will be polluted.
Using dummy variables is usually an admission of ignorance. another example might be estimation a regression for predicting noon-time temperatures in Central park. we could run a regreasion of noontime temperature on 12 dummy variables. the first would equal 1 if it is january and 0 otherwise, the second would take on 1 in February and etc. the intercept would be interpreted as the mean annual temperature. we would expect the coefficient on July to be positive and be interpreted as how much hotter than average it is in July. Now, we may expect the mean temperature on April 15 to be the Same as October 15. if we estimated the equation.
TEMP = a0 + b(sin((days past April 15)/365)*2pi)) + u we would get a more accurate picture.
I am often reminded of the pitiable econometrician who used a proxy for risk and a dummy for sex!!