CommencerCommencer gratuitement

Create a missing value ratio filter

The house_sales_df data frame contains a target variable price and a variety of predictors that describe individual houses and determine their selling prices. Several of the features have a varying number of missing values. If the missing value ratio is too high, then the feature will not be very informative in predicting the price of the house. These features can be removed. In this exercise, you will calculate the missing value ratio for each column. This will help you think about an appropriate threshold for each column.

The tidyverse package has been loaded for you.

Cet exercice fait partie du cours

Dimensionality Reduction in R

Afficher le cours

Instructions

  • Store the total number of rows in house_sales_df into n.
  • Calculate the missing value ratios for each column in house_sales_df and store them in missing_vals_df.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Calculate total rows
___ <-  ___(___)

# Calculate missing value ratios
___ <- ___ %>% 
  ___(___(___(), ~ ___(___(.)))) %>% 
  pivot_longer(everything(), names_to = "feature", values_to = "num_missing_values") %>% 
  mutate(missing_val_ratio = ___ / ___)

# Display missing value ratios
missing_vals_df
Modifier et exécuter le code