Get Started

Relative error

In this exercise, you will compare relative error to absolute error. For the purposes of modeling, we will define relative error as

$$ rel = \frac{(y - pred)}{y} $$

that is, the error is relative to the true outcome. You will measure the overall relative error of a model using root mean squared relative error:

$$ rmse_{rel} = \sqrt(\overline{rel^2}) $$

where \(\overline{rel^2}\) is the mean of \(rel^2\).

The example (toy) dataset fdata has been pre-loaded. It includes the columns:

  • y: the true output to be predicted by some model; imagine it is the amount of money a customer will spend on a visit to your store.
  • pred: the predictions of a model that predicts y.
  • label: categorical: whether y comes from a population that makes small purchases, or large ones.

You want to know which model does "better": the one predicting the small purchases, or the one predicting large ones.

This is a part of the course

“Supervised Learning in R: Regression”

View Course

Exercise instructions

  • Fill in the blanks to examine the data. Notice that large purchases tend to be about 100 times larger than small ones.
  • Fill in the blanks to create error columns:
    • Define residual as y - pred.
    • Define relative error as residual / y.
  • Fill in the blanks to calculate and compare RMSE and relative RMSE.
    • How do the absolute errors compare? The relative errors?
  • Examine the plot of predictions versus outcome.
    • In your opinion, which model does "better"?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# fdata is available
summary(fdata)

# Examine the data: generate the summaries for the groups large and small:
fdata %>% 
    group_by(label) %>%     # group by small/large purchases
    summarize(min  = ___,   # min of y
              mean = ___,   # mean of y
              max  = ___)   # max of y

# Fill in the blanks to add error columns
fdata2 <- fdata %>% 
         group_by(label) %>%       # group by label
           mutate(residual = ___,  # Residual
                  relerr   = ___)  # Relative error

# Compare the rmse and rmse.rel of the large and small groups:
fdata2 %>% 
  group_by(label) %>% 
  summarize(rmse     = ___,   # RMSE
            rmse.rel = ___)   # Root mean squared relative error
            
# Plot the predictions for both groups of purchases
ggplot(fdata2, aes(x = pred, y = y, color = label)) + 
  geom_point() + 
  geom_abline() + 
  facet_wrap(~ label, ncol = 1, scales = "free") + 
  ggtitle("Outcome vs prediction")

This exercise is part of the course

Supervised Learning in R: Regression

IntermediateSkill Level
4.2+
13 reviews

In this course you will learn how to predict future events using linear regression, generalized additive models, random forests, and xgboost.

Before moving on to more sophisticated regression techniques, we will look at some other modeling issues: modeling with categorical inputs, interactions between variables, and when you might consider transforming inputs and outputs before modeling. While more sophisticated regression techniques manage some of these issues automatically, it's important to be aware of them, in order to understand which methods best handle various issues -- and which issues you must still manage yourself.

Exercise 1: Categorical inputsExercise 2: Examining the structure of categorical inputsExercise 3: Modeling with categorical inputsExercise 4: InteractionsExercise 5: Modeling an interactionExercise 6: Modeling an interaction (2)Exercise 7: Transforming the response before modelingExercise 8: Relative error
Exercise 9: Modeling log-transformed monetary outputExercise 10: Comparing RMSE and root-mean-squared Relative ErrorExercise 11: Transforming inputs before modelingExercise 12: Input transforms: the "hockey stick"Exercise 13: Input transforms: the "hockey stick" (2)

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free