Exercise

# Relative error

In this exercise, you will compare relative error to absolute error. For the purposes of modeling, we will define relative error as

$$ rel = \frac{(y - pred)}{y} $$

that is, the error is relative to the true outcome. You will measure the overall relative error of a model using root mean squared relative error:

$$ rmse_{rel} = \sqrt(\overline{rel^2}) $$

where \(\overline{rel^2}\) is the mean of \(rel^2\).

The example (toy) dataset `fdata`

is loaded in your workspace. It includes the columns:

`y`

: the true output to be predicted by some model; imagine it is the amount of money a customer will spend on a visit to your store.`pred`

: the predictions of a model that predicts`y`

.`label`

: categorical: whether`y`

comes from a population that makes`small`

purchases, or`large`

ones.

You want to know which model does "better": the one predicting the `small`

purchases, or the one predicting `large`

ones.

Instructions

**100 XP**

The data frame `fdata`

is in the workspace.

- Fill in the blanks to examine the data. Notice that large purchases tend to be about 100 times larger than small ones.
- Fill in the blanks to create error columns:
- Define residual as
`y - pred`

. - Define relative error as
`residual / y`

.

- Define residual as
- Fill in the blanks to calculate and compare RMSE and relative RMSE.
- How do the absolute errors compare? The relative errors?

- Examine the plot of predictions versus outcome.
- In your opinion, which model does "better"?