Principal components in a regression analysis
1. Principal components in a regression analysis
As mentioned before, PCA can be a preparation step for further analysis. It can, for example, solve a multicollinearity problem in a regression analysis. Let's see what happens if we try to explain the new additional variable `customerSatis` measuring customer satisfaction with the variables we have already been working with.2. PC in regression analysis I
We use the `lm()` function from the `stats` package to estimate a linear regression. The dot to the right of the tilde includes all variables contained in the dataset in the model. We save the results in the object `mod1`. We compute the variance inflation factors using the function `vif()` from the `car` package. Many values are above 5 or even 10 which indicates strong multicollinearity and renders the regression estimates unstable. To solve this problem, we are going to use the selected principal components as regressors instead.3. PC in regression analysis II
First, we construct a dataframe called `dataCustComponents` using `cbind()` and the `data.frame()` function. It contains the customer satisfaction and the values for the first six components. Then we compute the linear model with the principal components as explanatory variables and save it in the object `mod2`. Because the components are by construction completely uncorrelated, all variance inflation factors equal one now. We extract the R-squared of the first and the second model from the `adj.r.squared` element of the `summary()` objects. Of course, the R-squared of the second model decreased slightly, but after all, only few variables are included in the model, and the estimates become more stable. The interpretation of the regression coefficients is now less straightforward, because they refer to the components and not the original variables. However, let's try to interpret the first three coefficients.4. PC in regression analysis III: interpretation
We display the results of the model using the `summary()` function. Remember that principal component 1 reflected low customer activity. The negative regression coefficient for PC1 means that customers with lower activity are less satisfied. In contrast, customers with few returns (that was PC2) are more satisfied. The coefficient for PC3 -- quality or brand awareness -- is not significant.5. Factor Analysis vs. PCA
A method that is often confused with PCA is factor analysis. Both methods are used as dimension reduction techniques, but the idea behind is a bit different. Factor analysis is shown on the left hand side of the figure. It identifies theoretical, latent constructs like intelligence. Those factors cannot be measured directly but manifest themselves in measurable variables. That means the factors influence the observed values of the variables. Correlations between the variables are attributed to the common factors. Variance which cannot be explained by the factors is seen as error variance. It is unrelated to the latent construct. One example factor analysis is used for, is the development of questionnaires. If you have a set of questions meant to measure, for example, a certain personality trait you can use factor analysis in order to investigate whether the items really just measure one dimension. In PCA, it is not the components that influence the variables but the other way around. The components are composed of the variables. Hence, you analyze how the items can be compressed to components with the target of losing as little information as possible. The remaining, uncovered variance is not seen as error variance. It is systematic variance but you just don't cover it with the selected components.6. Learnings and relevance
Well done, you made it through the whole chapter! Check out what you learned.7. Let's practice!
Let's finish strong with the last section of exercises for the whole course on Marketing Analytics in R and Statistical modeling. Thank you for your attention.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.