Get startedGet started for free

Regression Plots

1. Regression Plots

In Chapter 1, we briefly covered regression plots. Since regression plots are a very important tool in the data scientist's toolbox, Seaborn has a robust API that supports sophisticated analysis of datasets. In this lesson, we will spend some time looking at regression analysis of bicycle-sharing data from Washington, DC.

2. Bicycle Dataset

This dataset contains a summary of bicycle rental activity by day that has been joined with weather information such as temperature, humidity, and overall weather conditions. We can use Seaborn's regression plotting tools to evaluate the data from multiple perspectives and look for relationships between these numeric variables.

3. Plotting with regplot()

Here is a quick summary of the regplot() function. Like most of the Seaborn functions we have reviewed, the function requires the definition of the data and the x and y variables. In this example, we include a unique marker for the observations. Upon first glance, it looks like there is a relationship between temperature and total bike rentals in a day. This intuitively makes sense. People like to bike when the weather is warm.

4. Evaluating regression with residplot()

The residual plot is a very useful plot for understanding the appropriateness of a regression model. Ideally, the residual values in the plot should be plotted randomly across the horizontal line. In this specific example, the data looks like it might have a slight curve suggesting a nonlinear model might be appropriate.

5. Polynomial regression

If a value greater than 1 is passed to the order parameter of regplot(), then Seaborn will attempt a polynomial fit using underlying NumPy functions. In this example, Seaborn computes a second order polynomial function for the relationship between temperature and rentals. In this view, it looks like rentals might start to trail off if the weather gets too warm.

6. residplot with polynomial regression

The residual plot can interpret the second order polynomial and plot the residual values. In this example, the values are more randomly distributed, so a second order equation is likely more appropriate for this problem.

7. Categorical values

Seaborn also supports regression plots with categorical variables. It might be interesting to see how rentals change over the various months. In this example, using the jitter parameter makes it easier to see the individual distribution of the rental values for each month.

8. Estimators

In some cases, even with the jitter, it might be difficult to see if there are any trends based on the value of the variable. Using an estimator for the x value can provide another helpful view of the data. This simplified view shows a trend consistent with the seasons in Washington, DC.

9. Binning the data

When there are continuous variables, it can be helpful to break them into different bins. In this case, we can divide the temperature into four bins and Seaborn will take care of calculating those bins for us and plotting the results. This is much quicker than trying to use pandas or some other mechanism to create the bins. This shortcut function can help with getting a quick read on continuous data such as temperature.

10. Let's practice!

Seaborn's regplot() function supports several parameters for creating highly customized regression plots. Now, let's apply these concepts to some examples.