Relative error
In this exercise, you will compare relative error to absolute error. For the purposes of modeling, we will define relative error as
$$ rel = \frac{(y - pred)}{y} $$
that is, the error is relative to the true outcome. You will measure the overall relative error of a model using root mean squared relative error:
$$ rmse_{rel} = \sqrt(\overline{rel^2}) $$
where \(\overline{rel^2}\) is the mean of \(rel^2\).
The example (toy) dataset fdata
has been pre-loaded. It includes the columns:
y
: the true output to be predicted by some model; imagine it is the amount of money a customer will spend on a visit to your store.pred
: the predictions of a model that predictsy
.label
: categorical: whethery
comes from a population that makessmall
purchases, orlarge
ones.
You want to know which model does "better": the one predicting the small
purchases, or the one predicting large
ones.
This is a part of the course
“Supervised Learning in R: Regression”
Exercise instructions
- Fill in the blanks to examine the data. Notice that large purchases tend to be about 100 times larger than small ones.
- Fill in the blanks to create error columns:
- Define residual as
y - pred
. - Define relative error as
residual / y
.
- Define residual as
- Fill in the blanks to calculate and compare RMSE and relative RMSE.
- How do the absolute errors compare? The relative errors?
- Examine the plot of predictions versus outcome.
- In your opinion, which model does "better"?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# fdata is available
summary(fdata)
# Examine the data: generate the summaries for the groups large and small:
fdata %>%
group_by(label) %>% # group by small/large purchases
summarize(min = ___, # min of y
mean = ___, # mean of y
max = ___) # max of y
# Fill in the blanks to add error columns
fdata2 <- fdata %>%
group_by(label) %>% # group by label
mutate(residual = ___, # Residual
relerr = ___) # Relative error
# Compare the rmse and rmse.rel of the large and small groups:
fdata2 %>%
group_by(label) %>%
summarize(rmse = ___, # RMSE
rmse.rel = ___) # Root mean squared relative error
# Plot the predictions for both groups of purchases
ggplot(fdata2, aes(x = pred, y = y, color = label)) +
geom_point() +
geom_abline() +
facet_wrap(~ label, ncol = 1, scales = "free") +
ggtitle("Outcome vs prediction")