Modeling log-transformed monetary output
In this exercise, you will practice modeling on log-transformed monetary output, and then transforming the "log-money" predictions back into monetary units. The data loaded records subjects' incomes in 2005 (Income2005
), as well as the results of several aptitude tests taken by the subjects in 1981:
Arith
Word
Parag
Math
AFQT
(Percentile on the Armed Forces Qualifying Test)
The data have already been split into training and test sets (income_train
and income_test
, respectively) and pre-loaded. You will build a model of log(income) from the inputs, and then convert log(income) back into income.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Call
summary()
onincome_train$Income2005
to see the summary statistics of income in the training set. - Write a formula to express
log(Income2005)
as a function of the five tests as the variablefmla.log
. Print it. - Fit a linear model of
log(Income2005)
to theincome_train
data:model.log
. - Use
model.log
to predict income on theincome_test
dataset. Put it in the columnlogpred
.- Check
summary()
oflogpred
to see that the magnitudes are much different from those ofIncome2005
.
- Check
- Reverse the log transformation to put the predictions into "monetary units":
exp(income_test$logpred)
.- Check
summary()
ofpred.income
and see that the magnitudes are now similar toIncome2005
magnitudes.
- Check
- Fill in the blanks to plot a scatter plot of predicted income vs income on the test set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Examine Income2005 in the training set
___
# Write the formula for log income as a function of the tests and print it
(fmla.log <- ___)
# Fit the linear model
model.log <- ___
# Make predictions on income_test
income_test$logpred <- ___
summary(income_test$logpred)
# Convert the predictions to monetary units
income_test$pred.income <- ___
summary(income_test$pred.income)
# Plot predicted income (x axis) vs income
ggplot(___, aes(x = ___, y = ___)) +
geom_point() +
geom_abline(color = "blue")