Session Ready
Exercise

Studying residuals

To implement a linear model you must study the residuals, which are the distances between the predicted outcomes and the data.

Three conditions must be met:

  1. The mean should be 0.
  2. The variance must be constant.
  3. The distribution must be normal.

We will work with data of test scores for two schools, A and B, on the same subject. model_A and model_B were fitted with hours_of_study_A and test_scores_A and hours_of_study_B and test_scores_B, respectively.

matplotlib.pyplot has been imported as plt, numpy as np and LinearRegression from sklearn.linear_model.

Instructions 1/4
undefined XP
  • 1

    Make a scatter of hours_of_study_A and test_scores_A and plot hours_of_study_values_A and the outcomes from model_A.

    • 2

      Subtract the predicted values and test_scores_A, then make a scatterplot with hours_of_study_A and residuals_A.

    • 3

      Make a scatter of hours_of_study_B and test_scores_B and plot hours_of_study_values_B and the outcomes from model_B.

    • 4

      Subtract the predicted values and test_scores_B, then make a scatterplot with hours_of_study_B and residuals_B.