Dimensionality and feature information
Imagine you work for a bank and have collected information about different loans made to different people. Your boss wants you to begin exploring the possibility of using this data to classify customers into different credit score categories. A sample of the available data is loaded into credit_df
. You are curious about how many features the data has. You also want to identify features that will not be useful for classifying customers into different credit categories.
The tidyverse
package has been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Find the number of features in
credit_df
. - Compute the variance of each feature in
credit_df
. - Identify the feature with zero variance and assign it to
column_to_remove
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find the number of features
___ %>% ___()
# Compute each column variance
credit_df %>%
___(___(___(), ~ ___(., na.rm = TRUE))) %>%
pivot_longer(everything(), names_to = "feature", values_to = "variance")
# Assign the zero-variance column
column_to_remove <- "___"