1. Introduction to the Course
Hello! My name is Jason Vestuto, and I'll be your instructor for this course.
I'm a Scientist in the Space and Geophysics Lab of The University of Texas at Austin.
In this course, we'll see how to use python to build, evaluate, and apply linear models.
To do this, we'll use many tools from the python data science ecosystem, including matplotlib, numpy, scipy, statsmodels, and scikit learn.
Before we build models, we'll use exploratory data analysis, including visualization and descriptive statistics, to characterize the data to be modeled.
Then, we'll build models and use them to make predictions, quantifying the confidence we can have in those predictions.
Finally, we'll explore how linear regression relates to inferential statistics, with an introduction to model parameter estimation.
In the end, you'll be well prepared to move on to more advanced forms of regression, statistical modeling, and machine learning.
2. Introduction to Chapter 1
In this first chapter, we start with an introductory exploration of linear relationships.
First, we'll introduce some example applications of linear models, such as interpolation and extrapolation.
Next, we'll start exploring our data with visualization methods. This is a great first step to see trends that may be harder to find or interpret had you just jumped straight to quantitative methods.
Finally, we'll introduce some "descriptive" statistics, and see how they can help you prepare a more quantitative basis for building a model.
3. Example Trip Data
Let's start by looking at some data from a road trip.
Here we have Distance Traveled, plotted on the vertical y-axis, and Elapse Time, plotted on the horizontal x-axis.
Consider the total "range" of values of x and y.
The range of the data is the difference between the smallest and largest values of a given variable.
4. Models as Descriptions
Here, the range of y is 300 miles.
And the range of x is 6 hours.
Models can be as simple as describing your data.
If you traveled 300 miles in 6 hours, you may have guessed that you traveled at 50 miles-per-hour. That's a form of modeling.
You may have estimated the average speed of your trip by taking the ratio of the ranges of distances and times.
5. Visualizing a Model
Here's what that descriptive model would look like if plotted with the data. Here the red line is the model, and the black dots are the data.
6. Model Predictions
Models are often used to make predictions.
On your trip, if you multiplied your speed by a future time, you were making a prediction.
Let's write that in code. Here we express a model for the trip as `distance = 50 * time`
Notice that 50 = 300/6 is the ratio of "miles-per-hour" or the speed.
This expression is a model that *predicts* that we would travel 50 miles in 1 hour, 300 miles in 6 hours, and 1500 miles in 30 hours.
Models can be expressed as functions. Here we define and call the model function to get a predicted distance at time 10 hours.
7. Interpolation
Interpolation is a model prediction for points "in between" the times we actually measured. Here our model predicts we would travel about 175 miles in 3 and half hours.
8. Extrapolation
Extrapolation is a model prediction for a distance for a time outside the range of measured times.
Here, we can see the model predicts we would travel about 400 miles in 8 hours.
9. Let's practice!
Later in the course, we'll start building models, but for now, let's practice using prepared models to make some predictions.