CommencerCommencer gratuitement

Least-Squares with `statsmodels`

Several python libraries provide convenient abstracted interfaces so that you need not always be so explicit in handling the machinery of optimization of the model.

As an example, in this exercise, you will use the statsmodels library in a more high-level, generalized work-flow for building a model using least-squares optimization (minimization of RSS).

To help get you started, we've pre-loaded the data from x_data, y_data = load_data() and stored it in a pandas DataFrame with column names x_column and y_column using df = pd.DataFrame(dict(x_column=x_data, y_column=y_data))

Cet exercice fait partie du cours

Introduction to Linear Modeling in Python

Afficher le cours

Instructions

  • Construct a model ols() with formula formula="y_column ~ x_column" and data data=df, and then .fit() it to the data.
  • Use model_fit.predict() to get y_model values.
  • Using the provided function plot_data_with_model(), over-plot the y_data with y_model.
  • Extract the model parameter values a0 and a1 from model_fit.params.
  • Use compute_rss_and_plot_fit() to confirm these results are consistent with the analytic formulae implemented with numpy.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Pass data and `formula` into ols(), use and `.fit()` the model to the data
model_fit = ols(____="y_column ~ x_column", ____=df).____()

# Use .predict(df) to get y_model values, then over-plot y_data with y_model
y_model = model_fit.____(df)
fig = plot_data_with_model(x_data, ____, ____)

# Extract the a0, a1 values from model_fit.params
a0 = model_fit.____['Intercept']
a1 = model_fit.____['x_column']

# Visually verify that these parameters a0, a1 give the minimum RSS
fig, rss = compute_rss_and_plot_fit(a0, a1)
Modifier et exécuter le code