Multicollinearity
1. Multicollinearity
Multicollinearity indicates that some of the explanatory variables are correlated and unusual things may happen in the regression model. Here we will discuss one problem that can occur when variables are correlated.2. Regressing dollar amount on coins
Consider the following dataset. Each row represents change in an individual's pocked. The information includes the breakdown of which coins as well as the total monetary amount of the coins. The coins column gives the total number of coins in the pocket, and the small column gives the total number of pennies, nickels and dimes. We've shown the first 6 rows of the dataset, corresponding to the amount of change in six different individual's pockets. Although not a random sample, these data were collected by Jeff Witmer at Oberlin College.3. Amount vs. coins - plot
The scatterplot on total amount of money versus total number of coins indicates that the amount of money seems to be linearly related to the number of coins.4. Amount vs. coins - linear model
Indeed, the slope coefficient is quite statistically significant with a p-value of 6 times 10 to the negative 22.5. Amount vs. small coins - plot
Additionally, the amount of money also seems to be linearly related to the number of small coins (here the x-axis represents the number of pennies, nickels and dimes in each individual's pocket).6. Amount vs. small coins - linear model
The liner model on the number of small coins reinforces the previous plot with a positive and statistically significant slope coefficient.7. Amount vs. coins and small coins
However, when both the number of coins and the number of small coins are entered into the model, the coefficient associated with the number of small coins becomes NEGATIVE! that is because with multiple variables in the model, each coefficient is interpreted while holding all of the other variables constant. Let's say we know that an individual has 10 coins in her pocket. The predicted amount of money is much lower if we know that 9 of the coins are small as compared to knowing that only 1 of them is small. That is, the more small coins she has out of 10, the LOWER we will predict her amount to be. The number of coins and number of small coins are highly correlated, which is why the model presents a surprising sign on the small coin coefficient. When variables are correlated, interpreting the coefficients can sometimes be difficult, and we call this a problem of multicollinearity.8. Let's practice!
The next video will talk briefly about additional issues related to multiple variables in a linear model. However, first you will practice interpreting models when the variables are correlated.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.