Modeling log-transformed monetary output
In this exercise, you will practice modeling on log-transformed monetary output, and then transforming the "log-money" predictions back into monetary units. The data loaded records subjects' incomes in 2005 (Income2005), as well as the results of several aptitude tests taken by the subjects in 1981:
ArithWordParagMathAFQT(Percentile on the Armed Forces Qualifying Test)
The data have already been split into training and test sets (income_train and income_test, respectively) and pre-loaded. You will build a model of log(income) from the inputs, and then convert log(income) back into income.
Este ejercicio forma parte del curso
Supervised Learning in R: Regression
Instrucciones del ejercicio
- Call
summary()onincome_train$Income2005to see the summary statistics of income in the training set. - Write a formula to express
log(Income2005)as a function of the five tests as the variablefmla.log. Print it. - Fit a linear model of
log(Income2005)to theincome_traindata:model.log. - Use
model.logto predict income on theincome_testdataset. Put it in the columnlogpred.- Check
summary()oflogpredto see that the magnitudes are much different from those ofIncome2005.
- Check
- Reverse the log transformation to put the predictions into "monetary units":
exp(income_test$logpred).- Check
summary()ofpred.incomeand see that the magnitudes are now similar toIncome2005magnitudes.
- Check
- Fill in the blanks to plot a scatter plot of predicted income vs income on the test set.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Examine Income2005 in the training set
___
# Write the formula for log income as a function of the tests and print it
(fmla.log <- ___)
# Fit the linear model
model.log <- ___
# Make predictions on income_test
income_test$logpred <- ___
summary(income_test$logpred)
# Convert the predictions to monetary units
income_test$pred.income <- ___
summary(income_test$pred.income)
# Plot predicted income (x axis) vs income
ggplot(___, aes(x = ___, y = ___)) +
geom_point() +
geom_abline(color = "blue")