Mutual information features
The credit_df
data frame contains a number of continuous features. When two continuous features are correlated, they contain the same information — something called mutual information. Highly correlated features are not just redundant. They can cause problems in modeling. For instance, in regression, highly correlated features (i.e., multicollinearity) can cause nonsensical results. To get a sense of mutual information, you will create a correlation plot to identify features with mutual information.
The tidyverse
and corrr
packages have been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Use
correlate()
andrplot()
to create a correlation plot of the numeric features ofcredit_df
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a correlation plot
___ %>%
select(where(is.numeric)) %>%
___() %>%
shave() %>%
___(print_cor = TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))