Get startedGet started for free

Final models evaluation

1. Final models evaluation

Congratulations on making it to the final content of the course!

2. Final exercises

To wrap up the course and showcase your skills, you will develop a few final regression models exploring different predictors for the dataset overall and by group. You will save your results and extract model fit statistics to determine which models have the best fit. Learning these final skills will help you evaluate and compare models. These exercises will show you some ways to choose models that best capture the associations among the variables in your datasets.

3. Comparing models

Following the same approach you did in the last video, you can run regression models for predicting height differences between measured and self-reported heights in the davis dataset. In one regression model bmi is used as a predictor and in the second model weight is used as a predictor. You can save the lm model output and the summary output from both models.

4. Comparing models

From the saved model summary output, you can compare the r.squared statistics for both models where higher r2 values indicate better models. As seen here, the r.squared for the model with weight is slightly higher than the model with bmi. Another way to compare the models is to compute the AIC or Akaike Information Criterion statistic. You can compute this statistic using the AIC function for one or more model output objects. As seen here, the weight model with the smaller AIC value listed second is better.

5. Models by group - men vs women

Besides comparing models for different predictors, you may also want to compare models for different subsets of the data. For example, you might want to compare how well weight predicts the difference between measured and self-reported heights for women versus men. To visualize the association between diffht and weight for females versus males, you can use ggplot to make a scatterplot and add a fitted regression line using the geom_smooth function with the lm method. To see the plot for each sex, you use the facet_wrap layer for sex. The two fitted lines are both fairly flat indicating little to no association between weight and diffht. However, the slope of the line in the second plot for males is slightly negative indicating that heavier males may under report their heights.

6. Regression on subset

Like the SAS WHERE statement, the subset option can be used in the lm function to perform regression on a subset of the data. Let's use this subset option to compare models for males and females.

7. Regression on subset

For example, to run the regression on only the females you set the subset option equal to the output from the logical statement sex double equals F. This selects only the females to be input to the model from the dataset daviskeep. This is similar to the WHERE statement in SAS PROC REG.

8. Fit models for subsets

To test whether the regression model is stronger for males versus females, you run the lm function again, except this time, you define the subset option using the double equals logical operator to filter out only the rows where sex equals F for females or M for males. The lm model output is saved for the females only model and the males only model.

9. Fit models for subsets

From the saved lm model summaries for the females only and males only models, of the model objects, you can display the r.squared statistics from each model. It does appear that the model for males has a larger r.squared value indicating that for the males weight is a better predictor of the differences between measured and self-reported heights.

10. Let's wrap up by developing a few models for predicting abalone ages!

Let's wrap up by developing a few models for predicting abalone ages!