Stats with geoms
1. Stats with geoms
Welcome to the second ggplot2 course on data visualization! Here, we're going to build on the skills you learned in the first course to develop a wide variety of plots that are not only appealing, but also meaningful.2. ggplot2, course 2
We'll examine the following three layers in detail: statistics, coordinates, and facets, plus, we'll review some data viz tips so that you can make the most of your new skill-set. Let's get started with the stats layer.3. Statistics layer
There are two broad categories of functions in this family: those that are called from within a geom and those that are called independently. As you may have guessed, all the statistical functions begin with "stats", followed by an underscore. Even those called from within the geom layer can be accessed independently in this way.4. geom_ <-> stat_
We already saw a stats function when we used geom_histogram. Recall that under the hood, this called stat_bin to summarize the total count in each group.5. geom_ <-> stat_
You may also remember that when we discussed geom_bar, I mentioned that it's default stat is set to "bin" -- so we could have produced the same result if we use geom_bar!6. geom_ <-> stat_
The same thing happens with geom_bar, which just calls stat_count under the hood. If we called stat_count directly, we'd get the same plot since it would call geom_bar.7. The geom_/stat_ connection
So we can see that specific geoms and stat functions are related.8. stat_smooth()
stat_smooth can accessed with geom_smooth, shown here. The standard error, which is shown as a gray ribbon behind our smooth, is by default, a 95% confidence interval.9. stat_smooth(se = FALSE)
We can remove this by setting the se argument to FALSE. We know we are calling stat_smooth because of another warning message: "geom_smooth is using method equal to loess, and formula y dependent on x". LOESS is a non-parametric smoothing algorithm that is used when we have less than 1000 observations. It works by calculating a weighted mean by passing a sliding window along the x-axis and is a valuable tool in exploratory data analysis.10. geom_smooth(span = 0.4)
The span argument controls the degree of smoothing, which is the size of the sliding window. Smaller spans are more noisy, as we can see here.11. geom_smooth(method = "lm")
The method argument can also define parametric models, such as "lm", as shown here, or "glm", "rlm" and "gam". For groups larger than one thousand, the method defaults to gam. Notice that in both the LOESS and LM examples, the model is calculated on groups defined by color. We'll look at how to override this in the exercises.12. geom_smooth(fullrange = TRUE)
By default, each model is bound to the limits of its own group, but for parametric methods, we can use the fullrange argument to make predictions over the entire range. Just as we'd expect, the error increases the further away from our data set we attempt to define an estimate.13. The geom_/stat_ connection
We can access smoothing using the geom smooth function or the stat smooth function.14. Other stat_ functions
There are many other stats functions which we will encounter throughout the rest of the data visualization courses, some of which are particularly useful for summarizing data, like boxplots,15. Other stat_ functions
or dealing with very large data sets, such as bindot, binhex, bin2d and contour - we'll encounter those in the next course when we consider graphics of large datasets.16. Other stat_ functions
We'll encounter other functions throughout the exercises. In general, you won't have to call these functions directly, but it is worth knowing about the relationship between geoms and their respective statistics. You'll understand warning and error messages better and the help pages for the stats functions are often more informative if you need to adjust any parameters.17. Let's practice!
OK, let's see how stat functions work in practice in the exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.