Histograms
1. Histograms
In this section we'll take a look at the typical uses of bar plots and their associated geoms.2. Common plot types
A histogram is a special type of bar plot that shows the binned distribution of a continuous variable.3. Histograms
Here, we only need a single aesthetic: X, a continuous variable. geom_histogram plots a a binned version of our data. A message lets you know what happened. This geom is associated with a specific statistic, stat_bin. The bin argument took the default value of 30.4. Default of 30 even bins
This is a good starting point, but we don't need to settle for defaults! Let's change it and see what happens.5. Intuitive and meaningful bin widths
Changing the binwidth argument to 0-point-1 gives us a more intuitive impression of our data. Note that there is no space between the bars. That emphasizes that this is a representation of an underlying continuous distribution.6. Re-position tick marks
That's also why the labels on the x axis shouldn't fall directly on the bars, but between the bars. They represent intervals and not actual values. Setting the center argument to half that of the binwidth does the trick.7. Different Species
Remember that we have three species in our data set? We can fill the bars according to each species. This makes it clear that we have three histograms in the same plotting space. There is a perceptual problem here, because it is not immediately clear if the bars are overlapping or if they are stacked on top of each other.8. Default position is "stack"
The default position is stack. In some cases, this may not be clear, so don't risk confusing your viewer with stacked bars. We have some alternative positions we can use.9. position = "dodge"
We can "dodge" our bars, which is a data viz term that simply means to off-set set each data point in a given category. That works but the number of categories really makes it difficult to see what's happening. We'll encounter dodging again in several situations throughout these courses where it can be used to good effect.10. position = "fill"
The fill position normalizes each bin to represent the proportion of all observations in each bin. The y axis label didn't change, but it should say proportion, not count.11. Final Slide
Alright, let's head over to the exercises and take a look at histograms in action.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.