Aan de slagGa gratis aan de slag

Relative error

In this exercise, you will compare relative error to absolute error. For the purposes of modeling, we will define relative error as

$$ rel = \frac{(y - pred)}{y} $$

that is, the error is relative to the true outcome. You will measure the overall relative error of a model using root mean squared relative error:

$$ rmse_{rel} = \sqrt(\overline{rel^2}) $$

where \(\overline{rel^2}\) is the mean of \(rel^2\).

The example (toy) dataset fdata has been pre-loaded. It includes the columns:

  • y: the true output to be predicted by some model; imagine it is the amount of money a customer will spend on a visit to your store.
  • pred: the predictions of a model that predicts y.
  • label: categorical: whether y comes from a population that makes small purchases, or large ones.

You want to know which model does "better": the one predicting the small purchases, or the one predicting large ones.

Deze oefening maakt deel uit van de cursus

Supervised Learning in R: Regression

Cursus bekijken

Oefeninstructies

  • Fill in the blanks to examine the data. Notice that large purchases tend to be about 100 times larger than small ones.
  • Fill in the blanks to create error columns:
    • Define residual as y - pred.
    • Define relative error as residual / y.
  • Fill in the blanks to calculate and compare RMSE and relative RMSE.
    • How do the absolute errors compare? The relative errors?
  • Examine the plot of predictions versus outcome.
    • In your opinion, which model does "better"?

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# fdata is available
summary(fdata)

# Examine the data: generate the summaries for the groups large and small:
fdata %>% 
    group_by(label) %>%     # group by small/large purchases
    summarize(min  = ___,   # min of y
              mean = ___,   # mean of y
              max  = ___)   # max of y

# Fill in the blanks to add error columns
fdata2 <- fdata %>% 
         group_by(label) %>%       # group by label
           mutate(residual = ___,  # Residual
                  relerr   = ___)  # Relative error

# Compare the rmse and rmse.rel of the large and small groups:
fdata2 %>% 
  group_by(label) %>% 
  summarize(rmse     = ___,   # RMSE
            rmse.rel = ___)   # Root mean squared relative error
            
# Plot the predictions for both groups of purchases
ggplot(fdata2, aes(x = pred, y = y, color = label)) + 
  geom_point() + 
  geom_abline() + 
  facet_wrap(~ label, ncol = 1, scales = "free") + 
  ggtitle("Outcome vs prediction")
Code bewerken en uitvoeren