Get startedGet started for free

Linear regression - the fundamental method

1. Linear regression - the fundamental method

Let's start with the most basic regression algorithm: linear regression.

2. Linear Regression

Linear regression assumes that the expected outcome is the weighted sum of all the inputs. It also assumes that the change in y is linearly proportional to the change in any x. This is the simplest of the regression methods.

3. Linear Regression in R: lm()

In our cricket example, linear regression assumes that temperature is directly proportional to cricket chirp rate. In R, you fit a linear regression model using the lm function. This function takes as arguments a formula that describes the model you want to fit, and the data. Here, the data is in the data frame cricket, with an outcome column temperature and an input column chirps_per_sec.

4. Formulas

A formula in R is designated by a twiddle, or tilde. The left hand side of the formula is the outcome you want to predict, such as temperature or blood_pressure. The right hand side of the formula holds the input variables. You can concatenate multiple input variables with a plus sign. To convert a string into a formula, use the as-dot-formula function.

5. Looking at the Model

Print the model to look at its structure. You will see a report on the coefficients (or betas) of the model. The intercept is beta-zero: the value of the model when all the inputs are zero. The other coefficients are the weights for the weighted sum of the variables. In this example, the coefficient for chirps_per_sec is just over 3. That means two things: First the sign of the coefficient is positive, so temperature should increase as chirp rate increases. Second for every unit increase in chirp rate, the temperature should increase by a little over 3 degrees, if everything else is held constant.

6. More Information about the Model

You can get the model diagnostics by calling summary on the model. Summary includes not only the values of the coefficients, but the standard error in their estimated value, along with other diagnostics. We will cover some of these diagnostics in a later chapter, but for now, just know that they are available.

7. More Information about the Model

To get these diagnostics conveniently packaged in a data frame, use the glance function from the package broom. For the R-squared diagnostic you can also use the function wrapFTest from the package sigr.

8. Let's practice!

Now let's do some exercises to review what you've learned.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.