Fitting a linear regression

1. Fitting a linear regression

You may have noticed that the linear regression trend lines in the scatter plots were straight lines. That's a defining feature of a linear regression.

2. Straight lines are defined by two things

Straight lines are completely defined by two properties. The intercept is the y value when x is zero. The slope is the steepness of the line, equal to the amount y increases if you increase x by one. The equation for a straight line is that the y value is the intercept plus the slope times the x value.

3. Estimating the intercept

Here's the trend line from the Swedish insurance dataset. Let's try to estimate the intercept.

4. Estimating the intercept

To find the intercept, look at where the trend line intersects the y axis.

5. Estimating the intercept

It's below half way to the fifty mark, so I'd guess it's about twenty.

6. Estimating the slope

To estimate the slope, we need two points. To make the guessing easier, I've chosen points where the line is close to the gridlines.

7. Estimating the slope

First, we calculate the change in y values between the points. One y value is about four hundred and the other is about one hundred and fifty, so the difference is two hundred and fifty.

8. Estimating the slope

Now we do the same for the x axis. One point is at one hundred and ten, the other at forty. So the difference is seventy.

9. Estimating the slope

To estimate the slope we divide one number by the other. Two hundred and fifty divided by seventy is about three point five, so that is our estimate for the slope. Let's run a linear regression to check our guess.

10. Running a model

To run a linear regression model, you call the lm function with two arguments. The first argument is a formula. This is a type of variable used by many modeling functions. The response variable is written to the left of the tilde, and the explanatory variable is written to the right. The data argument takes the data frame containing the variables. When you print the resulting model, it tells you the code you used to create it, and two coefficients. These coefficients are the intercept and slope of the straight line. It seems our guesses were pretty close. The intercept is very close to our estimate of twenty. The slope, indicated here as n_claims, is three point four, slightly lower than what we guessed.

11. Interpreting the model coefficients

That means that we expect the total payment to be twenty plus three point four times the number of claims. So for every additional claim, we expect the total payment to increase by three point four.

12. Let's practice!

Time to fit some models.