Evaluating model performance

1. Evaluating model performance

After fitting a model with the parsnip package, the next step is to evaluate its performance on the test dataset with the yardstick package.

2. Input to yardstick functions

All yardstick functions require a tibble with model results as the first argument. The data must include a column with the true outcome variable values and a column with the model predictions. The mpg_test_results tibble from the previous section is an example of the required input - it contains the true outcome values in the hwy column and model predictions in the dot-pred column.

3. Root mean squared error (RMSE)

A common performance metric for regression models is the root mean squared error, or RMSE. The RMSE estimates the average prediction error of a model and is calculated with the rmse() function. To calculate the RMSE on our mpg model, we pass mpg_test_results to the rmse() function and specify hwy as the truth and dot-pred as the estimate. We see that the average prediction error of our model is about 1 point 93 miles per gallon for the estimated highway fuel efficiency values.

4. R squared metric

Another important regression metric is R squared, also known as the coefficient of determination. R squared measures the squared correlation between actual and predicted values and ranges from 0 to 1, where 1 indicates that all predictions equal the true outcome values. R squared is calculated with the rsq() function and takes the same arguments as the rmse() function.

5. R squared plots

R squared plots are a way to visualize R squared and consist of a scatter plot with model predictions on the y-axis and true outcome values on the x-axis. The line y = x is also plotted and represents the case where all predictions and outcome values are equal, giving an R squared value of 1. R squared plots are helpful for identifying problems with model performance, such as non-linear relationships between the outcome variable and predictors or regions where the model may be systematically under or over-predicting.

6. Plotting R squared plots

R squared plots are made with ggplot2 and require a tibble of model results, such as our mpg_test_results. The geom_point() function is used to create the scatter plot of actual vs predicted values, while the geom_abline() function is used to plot the line y = x. The special coordinate function, coord_obs_pred(), adjusts the x and y axis to the same scale.

7. Streamlining model fitting

The last_fit() function is used to streamline the model fitting and evaluation process in tidymodels. It takes a parnsip model object, model formula, and rsample data split object and performs the following steps. It creates training and test datasets, fits the model to the training data, calculates metrics and predictions on the test data, and returns an object with all the results. To fit our linear regression model on the mpg data, we pass the lm_model parsnip object to last_fit(), specify our model formula, and provide the mpg_split data split object.

8. Collecting metrics

Once the model is trained with the last_fit() function, we pass the lm_last_fit object to the collect_metrics() function to get a tibble with calculated metrics on the test data. The default metrics for regression models are RMSE and R squared and are always stored in the column named dot-estimate. We get the same performance metrics on the mpg_test data as before, just with a lot less work!

9. Collecting predictions

To collect the model predictions on the test data, we pass the lm_last_fit object to the collect_predictions() function and obtain a tibble with the test dataset predictions. The predictions column is always named dot-pred. The outcome variable, hwy in our case, is also included along with other row identifier columns.

10. Let's evaluate some models!

Let's practice evaluating model performance with yardstick!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.