The Dummy Variable Trap!!!!!!

Chukwudi Uraih, MBA
1 min readOct 29, 2017

--

The moral of the story is do not forget to drop a column when dealing with dummy variables.

Dummy variables are used to help interpret categorical variables when working with regressions in python. The example below is from a project trying determine where to open a new liquor store in Iowa and see which factors influence sales on liquor in Iowa the most.

In this example I created dummy variables for the 90+ counties in Iowa. What I did not do was drop a dummy variables prior to running my regression. With a different data set, I could have issues collinearity. Collinearity is “ is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. To avoid this it is best practice to drop a dummy variables created and to be aware that the coefficients on the remaining features will then be an estimate of the categories’ mean relative to the left out category.

--

--

Chukwudi Uraih, MBA
Chukwudi Uraih, MBA

Written by Chukwudi Uraih, MBA

I am a systems thinker who thinks he is a data scientist who wants to help you get financially free.

Responses (1)