Coordinates vs. scales

1. Coordinates vs. scales

In the next set of exercises, I want to look at how to use the coordinate layer to perform transformations, and how that differs from using the scale functions.

2. Plot the raw data

For these examples, I'm going to use the body weight variable from the msleep data set. I made more adjustments than what's shown here, but this is the basic code to get this univariate plot. We can see that this variable has a strong positive skew. In the first course, we saw how we can use the scale functions to modify things like the x-axis limits and breaks. Let's consider three ways in which we can transform our data. A common transformation for positively skewed data is a natural, base e, logarithm, or the more intuitive common, base 10, logarithm.

3. Transform the raw data

We can transform the data before we begin plotting, and update the actual data frame, or we can transform the variable on-the-fly when we specify it in the aes function, as shown here. The result is the same. So far, so good! Notice that the axis labels are the log-transformed values, where zero is the log 10 of 1 kilograms, and 4 is the log 10 of 10000 kilograms. This is a very common solution, but it is a bit misleading in that the transformed scale is linear and we have to do some mental arithmetic to get back to the original values. So we've lost a bit of precision here.

4. Add logtick annotation

We could add log annotation tick marks using the annotation_logticks function. This highlights that the data is a log transformation. However, another solution is to have the data on a log scale, and label it with the actual original body weight value. We can do this in two ways.

5. Use scale_*_log10()

The first method uses the scale_x_log10 function. This transformed the data and then calculates any statistics needed.

6. Compare direct transform and scale_*_log10() output

The plots are almost identical, but pay attention to the axis labeling in the second plot using the scale_x_log10 function. The labels correspond to the actual value in the data set. This is the default output, we saw how to clean up axis labels in the first course.

7. Use coord_trans()

As you could imagine, we also have a function in the coordinate layer: coord_trans, which is actually more flexible in that we can apply any transformation we'd like.

8. Compare scale_*_log10() and coord_trans() output

Using coord_trans and setting the x argument to "log10" results in the same plot as with the scale function. The default labels happen to be different, but the plot is the same.

9. Adjusting labels

As a final step, we can add the actual values of the data on the axis. This is a really nice way to show the transformed values in relation to the original value on the axis labels. This may give you the impression that scale and coord functions work in the same way, but just like zooming, there are some fundamental differences under the hood when applying transformations. We'll take a look at those in the exercises.

10. Time for exercises

Alright, now that you know how to use the scale and coord functions to apply transformations, let's look at bivariate plots and see how these functions affect our statistics.