Modeling log-transformed monetary output
In this exercise, you will practice modeling on log-transformed monetary output, and then transforming the "log-money" predictions back into monetary units. The data loaded records subjects' incomes in 2005 (Income2005), as well as the results of several aptitude tests taken by the subjects in 1981:
- Arith
- Word
- Parag
- Math
- AFQT(Percentile on the Armed Forces Qualifying Test)
The data have already been split into training and test sets (income_train and income_test, respectively) and pre-loaded. You will build a model of log(income) from the inputs, and then convert log(income) back into income.
Este exercício faz parte do curso
Supervised Learning in R: Regression
Instruções do exercício
- Call summary()onincome_train$Income2005to see the summary statistics of income in the training set.
- Write a formula to express log(Income2005)as a function of the five tests as the variablefmla.log. Print it.
- Fit a linear model of log(Income2005)to theincome_traindata:model.log.
- Use model.logto predict income on theincome_testdataset. Put it in the columnlogpred.- Check summary()oflogpredto see that the magnitudes are much different from those ofIncome2005.
 
- Check 
- Reverse the log transformation to put the predictions into "monetary units": exp(income_test$logpred).- Check summary()ofpred.incomeand see that the magnitudes are now similar toIncome2005magnitudes.
 
- Check 
- Fill in the blanks to plot a scatter plot of predicted income vs income on the test set.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Examine Income2005 in the training set
___
# Write the formula for log income as a function of the tests and print it
(fmla.log <- ___)
# Fit the linear model
model.log <-  ___
# Make predictions on income_test
income_test$logpred <- ___
summary(income_test$logpred)
# Convert the predictions to monetary units
income_test$pred.income <- ___
summary(income_test$pred.income)
#  Plot predicted income (x axis) vs income
ggplot(___, aes(x = ___, y = ___)) + 
  geom_point() + 
  geom_abline(color = "blue")