Stats: sum and quantile
1. Stats: sum and quantile
Let's wrap up our discussion of stats called from within geoms by looking at two more useful functions: geom_count and geom_quantile2. Recall from course 1
In the first course we saw that over-plotting is always a concern whenever we use geom point. Every data point must be visible. We discussed four ways in which our visualizations may mislead us.3. Plot counts to overcome over-plotting
We can now add a new geom function to our solutions for low precision and integer data: geom_count plot the count at each location. In course 3, we'll see even more elegant solutions that can be applied to all four situations. Let's look at an example with geom_count.4. Low precision (& integer) data
In the iris data set, where we have low-precision data, jittering gives the impression that we have more precision that we actually do.5. Jittering may give a wrong impressions
We should always mention that we've jittered our data because of this.6. geom_count()
To avoid this problem, we can use another variant of geom_point. geom_count counts the number of observations at each location and then maps the count onto size as the point area. Our data is mapped onto the area of the circle, as opposed to its radius, since we more intuitively perceive area than radius.7. The geom/stat connection
Remember that these geoms are associated with stats functions that can be called directly, as shown here.8. stat_sum()
Calling the stat function gives the exact same plot. We'll see this trick used with integer data in the exercises, which is a very common use.9. Over-plotting can still be a problem!
But be careful here, you'll still encounter over-plotting if the points are colored according to another variable. This makes it particularly difficult to read the plot!10. geom_quantile()
The last function I want to look at in this section is geom_quantile. It's another great tool for describing our data. This method allows us to model quantiles, which are robust, as opposed to linear models, which model the non-robust mean. We can choose any quantile we're interested in, such as the median, which is just the second quartile. A typical case of using quantile regression would be when you have heteroscedasticity, that is the variance across the predictor variable is not consistent, in which case linear models may not be valid.11. Dealing with heteroscedasticity
Here's an example of heteroscedasticity from a dataset of economics journals from the AER package. We won't get into the details of the data, but you can see that variance on the y axis is not consistent as we move along x axis.12. Using geom_quantiles
Here, we can use geom_quantile to model the 5th and the 95th percentile as well as the median, the 50th percentile.13. The geom/stat connection
Just like the previous geoms, this is also associated with a stats function that we can actually call directly.14. Ready for exercises!
Let's take these functions for a spin with some exercises!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.