ComeçarComece de graça

Imputing with random forests

A machine learning approach to imputation might be both more accurate and easier to implement compared to traditional statistical models. First, it doesn't require you to specify relationships between variables. Moreover, machine learning models such as random forests are able to discover highly complex, non-linear relations and exploit them to predict missing values.

In this exercise, you will get acquainted with the missForest package, which builds a separate random forest to predict missing values for each variable, one by one. You will call the imputing function on the biographic movies data, biopics, which you have worked with earlier in the course and then extract the filled-in data as well as the estimated imputation errors.

Let's plant some random forests!

Este exercício faz parte do curso

Handling Missing Data with Imputations in R

Ver curso

Instruções do exercício

  • Load the missForest package.
  • Use missForest() to impute missing values in the biopicsdata; assign the result to imp_res.
  • Extract the imputed data set from imp_res, assign it to imp_data, and check if the number of missing values is indeed zero.
  • Extract the estimated imputation error from imp_res, assign it to imp_err, and print it to the console.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Load the missForest package
___

# Impute biopics data using missForest
imp_res <- ___(___)

# Extract imputed data and check for missing values
imp_data <- imp_res$___
print(___(___(___)))

# Extract and print imputation errors
imp_err <- imp_res$___
print(___)
Editar e executar o código