Get startedGet started for free

Random forest for classification

1. Classification With Random Forests

You've successfully calculated the average cross validation performance for logistic regression. Now let's try the random forest model to see if it improves the prediction performance.

2. ranger() for Classification

Tuning and building the random forest models is the same as before. The only changes you need to think about is the values of mtry to tune. Since there are 30 features in the attrition dataset, this value can go as high as 30. For now we will try out a few mtry values.

3. 1) Prepare Actual Classes

To evaluate the random forest model, you use the same framework of comparing the actual and predicted classes. Preparing the actual values is the same as before. You simply convert the Yes and No to TRUE and FALSE, respectively.

4. 2) Prepare Predicted Classes

To generate the predicted values for a ranger model you need to first use the predict() function as shown here. By default, ranger outputs the character class, in this case Yes or No. To calculate the performance you simply need to convert this to a binary vector like so.

5. Build the Best Attrition Model

Now you have all of the tools that you need to calculate the validation recall of your random forest models. After building and evaluating these models you can compare their performance to the logistic regression model in order to select the best performing model. This will allow you to prepare your final model and calculate its test performance metrics.