The Dummy Variable Trap!!!!!!
The moral of the story is do not forget to drop a column when dealing with dummy variables.
Dummy variables are used to help interpret categorical variables when working with regressions in python. The example below is from a project trying determine where to open a new liquor store in Iowa and see which factors influence sales on liquor in Iowa the most.
In this example I created dummy variables for the 90+ counties in Iowa. What I did not do was drop a dummy variables prior to running my regression. With a different data set, I could have issues collinearity. Collinearity is “ is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. To avoid this it is best practice to drop a dummy variables created and to be aware that the coefficients on the remaining features will then be an estimate of the categories’ mean relative to the left out category.