Least-Squares with `statsmodels`
Several python libraries provide convenient abstracted interfaces so that you need not always be so explicit in handling the machinery of optimization of the model.
As an example, in this exercise, you will use the statsmodels
library in a more high-level, generalized work-flow for building a model using least-squares optimization (minimization of RSS).
To help get you started, we've pre-loaded the data from x_data, y_data = load_data()
and stored it in a pandas DataFrame with column names x_column
and y_column
using df = pd.DataFrame(dict(x_column=x_data, y_column=y_data))
This exercise is part of the course
Introduction to Linear Modeling in Python
Exercise instructions
- Construct a model
ols()
with formulaformula="y_column ~ x_column"
and datadata=df
, and then.fit()
it to the data. - Use
model_fit.predict()
to gety_model
values. - Using the provided function
plot_data_with_model()
, over-plot they_data
withy_model
. - Extract the model parameter values
a0
anda1
frommodel_fit.params
. - Use
compute_rss_and_plot_fit()
to confirm these results are consistent with the analytic formulae implemented withnumpy
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Pass data and `formula` into ols(), use and `.fit()` the model to the data
model_fit = ols(____="y_column ~ x_column", ____=df).____()
# Use .predict(df) to get y_model values, then over-plot y_data with y_model
y_model = model_fit.____(df)
fig = plot_data_with_model(x_data, ____, ____)
# Extract the a0, a1 values from model_fit.params
a0 = model_fit.____['Intercept']
a1 = model_fit.____['x_column']
# Visually verify that these parameters a0, a1 give the minimum RSS
fig, rss = compute_rss_and_plot_fit(a0, a1)