Aan de slagGa gratis aan de slag

Speed-accuracy trade-off

In the last video, you have seen there are two knobs you can tune to influence the performance of the random forests:

  • Number of decision trees in each forest.
  • Number of variables used for splitting within decision trees.

Increasing each of them might improve the accuracy of the imputation model, but it will also require more time to run. In this exercise, you will explore these ideas yourself by fitting missForest() to the biopics data twice with different settings. As you follow the instructions, pay attention to the errors you will be printing, and to the time the code takes to run.

Deze oefening maakt deel uit van de cursus

Handling Missing Data with Imputations in R

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Set number of trees to 5 and number of variables used for splitting to 2
imp_res <- missForest(biopics, ___ = ___, ___ = ___)

# Print the resulting imputation errors
print(___)
Code bewerken en uitvoeren