Linear regression on appropriate Anscombe data
For practice, perform a linear regression on the data set from Anscombe's quartet that is most reasonably interpreted with linear regression.
This exercise is part of the course
Statistical Thinking in Python (Part 2)
Exercise instructions
- Compute the parameters for the slope and intercept using
np.polyfit()
. The Anscombe data are stored in the arraysx
andy
. - Print the slope
a
and interceptb
. - Generate theoretical \(x\) and \(y\) data from the linear regression. Your \(x\) array, which you can create with
np.array()
, should consist of3
and15
. To generate the \(y\) data, multiply the slope byx_theor
and add the intercept. - Plot the Anscombe data as a scatter plot and then plot the theoretical line. Remember to include the
marker='.'
andlinestyle='none'
keyword arguments in addition tox
andy
when to plot the Anscombe data as a scatter plot. You do not need these arguments when plotting the theoretical line. - Hit submit to see the plot!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform linear regression: a, b
a, b = ____
# Print the slope and intercept
print(____, ____)
# Generate theoretical x and y data: x_theor, y_theor
x_theor = np.array([____, ____])
y_theor = ____ * ____ + ____
# Plot the Anscombe data and theoretical line
_ = ____
_ = ____
# Label the axes
plt.xlabel('x')
plt.ylabel('y')
# Show the plot
plt.show()