Dimensionality and feature information

Imagine you work for a bank and have collected information about different loans made to different people. Your boss wants you to begin exploring the possibility of using this data to classify customers into different credit score categories. A sample of the available data is loaded into credit_df. You are curious about how many features the data has. You also want to identify features that will not be useful for classifying customers into different credit categories.

The tidyverse package has been loaded for you.

This exercise is part of the course

Dimensionality Reduction in R

View Course

Exercise instructions

Find the number of features in credit_df.
Compute the variance of each feature in credit_df.
Identify the feature with zero variance and assign it to column_to_remove.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Find the number of features
___ %>% ___()

# Compute each column variance
credit_df %>% 
  ___(___(___(), ~ ___(., na.rm = TRUE))) %>% 
  pivot_longer(everything(), names_to = "feature", values_to = "variance")

# Assign the zero-variance column
column_to_remove <- "___"

Edit and Run Code