Get startedGet started for free

Modeling log-transformed monetary output

In this exercise, you will practice modeling on log-transformed monetary output, and then transforming the "log-money" predictions back into monetary units. The data loaded records subjects' incomes in 2005 (Income2005), as well as the results of several aptitude tests taken by the subjects in 1981:

  • Arith
  • Word
  • Parag
  • Math
  • AFQT (Percentile on the Armed Forces Qualifying Test)

The data have already been split into training and test sets (income_train and income_test, respectively) and pre-loaded. You will build a model of log(income) from the inputs, and then convert log(income) back into income.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Call summary() on income_train$Income2005 to see the summary statistics of income in the training set.
  • Write a formula to express log(Income2005) as a function of the five tests as the variable fmla.log. Print it.
  • Fit a linear model of log(Income2005) to the income_train data: model.log.
  • Use model.log to predict income on the income_test dataset. Put it in the column logpred.
    • Check summary() of logpred to see that the magnitudes are much different from those of Income2005.
  • Reverse the log transformation to put the predictions into "monetary units": exp(income_test$logpred).
    • Check summary() of pred.income and see that the magnitudes are now similar to Income2005 magnitudes.
  • Fill in the blanks to plot a scatter plot of predicted income vs income on the test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Examine Income2005 in the training set
___

# Write the formula for log income as a function of the tests and print it
(fmla.log <- ___)

# Fit the linear model
model.log <-  ___

# Make predictions on income_test
income_test$logpred <- ___
summary(income_test$logpred)

# Convert the predictions to monetary units
income_test$pred.income <- ___
summary(income_test$pred.income)

#  Plot predicted income (x axis) vs income
ggplot(___, aes(x = ___, y = ___)) + 
  geom_point() + 
  geom_abline(color = "blue")
Edit and Run Code