Get startedGet started for free

Visualizing trends

1. Visualizing trends

Now that we can produce scatter plots of survey data, let's highlight potential relationships by adding trend lines.

2. Scatter plots

Returning to the scatter plot of age versus head circumference that we created in the last video, there is a clear positive trend between these variables. However, where exactly does the line of best fit lie? Let's add that to the plot.

3. Survey-Weighted Line of Best Fit

We can add the line of best fit by adding one more layer to our ggplot(), geom_smooth. We then specify method equals "lm" for linear model and se equals FALSE to not map the standard error. The last piece is to add a new aesthetic argument that we didn't need for our geom_jitter: weight equals survey weights. Recall, that the line of best fit, also called the regression line, is the one that minimizes the squared distance between the points and the line. For survey data, we want to weight that distance by the survey weight. This means that the larger the survey weight, the more important it is that it lies close to the line of best fit. Why does that make sense? Because points with a high survey weight represent more people in the population. Okay, now that we can add a trend line, let's incorporate a categorical predictor into the mix. How does age and head size relate by gender?

4. Trend Lines

To add gender to our plot, we first need to add it to the babies dataset.

5. Trend Lines

And now to incorporate gender into our plot, we will map it to color. Notice that we now have two trend lines, one for each gender. Interestingly, they are almost parallel lines with the male head size about 1-oint-2 centimeters larger than the females for a given age.

6. Let's practice!

Now it's your turn to add trend lines to scatter plots!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.