Relative error
In this exercise, you will compare relative error to absolute error. For the purposes of modeling, we will define relative error as
$$ rel = \frac{(y - pred)}{y} $$
that is, the error is relative to the true outcome. You will measure the overall relative error of a model using root mean squared relative error:
$$ rmse_{rel} = \sqrt(\overline{rel^2}) $$
where \(\overline{rel^2}\) is the mean of \(rel^2\).
The example (toy) dataset fdata has been pre-loaded. It includes the columns:
- y: the true output to be predicted by some model; imagine it is the amount of money a customer will spend on a visit to your store.
- pred: the predictions of a model that predicts- y.
- label: categorical: whether- ycomes from a population that makes- smallpurchases, or- largeones.
You want to know which model does "better": the one predicting the small purchases, or the one predicting large ones.
Este exercício faz parte do curso
Supervised Learning in R: Regression
Instruções do exercício
- Fill in the blanks to examine the data. Notice that large purchases tend to be about 100 times larger than small ones.
- Fill in the blanks to create error columns: - Define residual as y - pred.
- Define relative error as residual / y.
 
- Define residual as 
- Fill in the blanks to calculate and compare RMSE and relative RMSE.- How do the absolute errors compare? The relative errors?
 
- Examine the plot of predictions versus outcome.- In your opinion, which model does "better"?
 
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# fdata is available
summary(fdata)
# Examine the data: generate the summaries for the groups large and small:
fdata %>% 
    group_by(label) %>%     # group by small/large purchases
    summarize(min  = ___,   # min of y
              mean = ___,   # mean of y
              max  = ___)   # max of y
# Fill in the blanks to add error columns
fdata2 <- fdata %>% 
         group_by(label) %>%       # group by label
           mutate(residual = ___,  # Residual
                  relerr   = ___)  # Relative error
# Compare the rmse and rmse.rel of the large and small groups:
fdata2 %>% 
  group_by(label) %>% 
  summarize(rmse     = ___,   # RMSE
            rmse.rel = ___)   # Root mean squared relative error
            
# Plot the predictions for both groups of purchases
ggplot(fdata2, aes(x = pred, y = y, color = label)) + 
  geom_point() + 
  geom_abline() + 
  facet_wrap(~ label, ncol = 1, scales = "free") + 
  ggtitle("Outcome vs prediction")