How is it optimal?
The function np.polyfit() that you used to get your regression parameters finds the optimal slope and intercept. It is optimizing the sum of the squares of the residuals, also known as RSS (for residual sum of squares). In this exercise, you will plot the function that is being optimized, the RSS, versus the slope parameter a. To do this, fix the intercept to be what you found in the optimization. Then, plot the RSS vs. the slope. Where is it minimal?
This exercise is part of the course
Statistical Thinking in Python (Part 2)
Exercise instructions
- Specify which values of the slope for which to compute the RSS. Use
np.linspace()to get200points in the range between0and0.1. - Initialize an array,
rss, to contain the RSS usingnp.empty_like(). - Write a
forloop to compute the sum of RSS of the slope. Hint: the RSS is given bynp.sum((y_data - a * x_data - b)**2). The variablebyou computed in the last exercise is already in your namespace. - Plot the RSS versus slope. Be sure to label your axes.
- Show your plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Specify slopes to consider: a_vals
a_vals = ____
# Initialize sum of square of residuals: rss
rss = ____
# Compute sum of square of residuals for each value of a_vals
for i, a in enumerate(a_vals):
rss[i] = ____
# Plot the RSS
plt.plot(____, ____, '-')
plt.xlabel('slope (children per woman / percent illiterate)')
plt.ylabel('sum of square of residuals')
plt.show()