1. Learn
  2. /
  3. Courses
  4. /
  5. Supervised Learning in R: Regression

Exercise

Input transforms: the "hockey stick"

In this exercise, we will build a model to predict price from a measure of the house's size (surface area). The houseprice dataset, loaded for you, has the columns:

  • price: house price in units of $1000
  • size: surface area

A scatterplot of the data shows that the data is quite non-linear: a sort of "hockey-stick" where price is fairly flat for smaller houses, but rises steeply as the house gets larger. Quadratics and tritics are often good functional forms to express hockey-stick like relationships. Note that there may not be a "physical" reason that price is related to the square of the size; a quadratic is simply a closed form approximation of the observed relationship.

scatterplot

You will fit a model to predict price as a function of the squared size, and look at its fit on the training data.

Because ^ is also a symbol to express interactions, use the function I() to treat the expression x^2 “as is”: that is, as the square of x rather than the interaction of x with itself.

exampleFormula = y ~ I(x^2)

Instructions

100 XP
  • Write a formula, fmla_sqr, to express price as a function of squared size. Print it.
  • Fit a model model_sqr to the data using fmla_sqr
  • For comparison, fit a linear model model_lin to the data using the formula price ~ size.
  • Fill in the blanks to
    • make predictions from the training data from the two models
    • gather the predictions into a single column pred
    • graphically compare the predictions of the two models to the data. Which fits better?