1. Learn
  2. /
  3. Courses
  4. /
  5. Kaggle R Tutorial on Machine Learning

Exercise

Important variables

Your Random Forest object my_forest is still loaded in. Remember you set importance = TRUE? Now you can see what variables are important using

varImpPlot(my_forest)

Type it into the console and see what happens.

When running the function, two graphs appear: the accuracy plot shows how much worse the model would perform without the included variables. So a high decrease (= high value x-axis) links to a high predictive variable. The second plot is the Gini coefficient. The higher the variable scores here, the more important it is for the model.

Based on the two plots, what variable has the highest impact on the model?

Instructions

50 XP

Possible answers