1
Exploring Linear Trends
Free
We start the course with an initial exploration of linear relationships, including some motivating examples of how linear models are used, and demonstrations of data visualization methods from matplotlib. We then use descriptive statistics to quantify the shape of our data and use correlation to quantify the strength of linear relationships between two variables.
2
Building Linear Models
Here we look at the parts that go into building a linear model. Using the concept of a Taylor Series, we focus on the parameters slope and intercept, how they define the model, and how to interpret the them in several applied contexts. We apply a variety of python modules to find the model that best fits the data, by computing the optimal values of slope and intercept, using least-squares, numpy, statsmodels, and scikit-learn.
3
Making Model Predictions
Next we will apply models to real data and make predictions. We will explore some of the most common pit-falls and limitations of predictions, and we evaluate and compare models by quantifying and contrasting several measures of goodness-of-fit, including RMSE and R-squared.
4
Estimating Model Parameters
In our final chapter, we introduce concepts from inferential statistics, and use them to explore how maximum likelihood estimation and bootstrap resampling can be used to estimate linear model parameters. We then apply these methods to make probabilistic statements about our confidence in the model parameters.

Sample Statistics versus Population

In this exercise you will work with a preloaded population. You will construct a sample by drawing points at random from the population. You will compute the mean standard deviation of the sample taken from that population to test whether the sample is representative of the population. Your goal is to see where the sample statistics are the same or very close to the population statistics.

Compute and print the mean and standard deviation of the population data.
Use the np.random.seed() method to set numpy's pseudorandom sampler seed as 42.
Use np.random.choice() to create a sample of size=31, where size is the number of points drawn from the population.
Compute and print the mean and standard deviation of the sample and inspect the printed values of the sample statistics and population statistics to see whether they differ.

script.py

IPython Shell

Exploring Linear Trends

Building Linear Models

Making Model Predictions

Estimating Model Parameters

Exercise

Exercise

Sample Statistics versus Population

Instructions