Create a missing value ratio filter
The house_sales_df
data frame contains a target variable price
and a variety of predictors that describe individual houses and determine their selling prices. Several of the features have a varying number of missing values. If the missing value ratio is too high, then the feature will not be very informative in predicting the price of the house. These features can be removed. In this exercise, you will calculate the missing value ratio for each column. This will help you think about an appropriate threshold for each column.
The tidyverse
package has been loaded for you.
Diese Übung ist Teil des Kurses
Dimensionality Reduction in R
Anleitung zur Übung
- Store the total number of rows in
house_sales_df
inton
. - Calculate the missing value ratios for each column in
house_sales_df
and store them inmissing_vals_df
.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Calculate total rows
___ <- ___(___)
# Calculate missing value ratios
___ <- ___ %>%
___(___(___(), ~ ___(___(.)))) %>%
pivot_longer(everything(), names_to = "feature", values_to = "num_missing_values") %>%
mutate(missing_val_ratio = ___ / ___)
# Display missing value ratios
missing_vals_df