Polishing the dot plot
1. Polishing the dot plot
Great work. You've explored a new type of visualization in ggplot2. Actually, building the plot is quite easy, you only used the geom_path geometry for the lines with arrows, and the geom_text geometry to add labels.2. <<<New Slide>>>
But still, by just looking at the plot you can probably tell that it's far from perfect. Obviously, the labels should be better placed – now, they overlap with the lines. Also, it's hard to compare countries with each other because they are ordered alphabetically at the moment. It's not straightforward to tell which country still has the highest weekly working hours. The solution lies in factor levels.3. Factor levels
Factors are used to describe categorical variables with a fixed and known set of **levels**. The order of these levels usually determines the order of appearance in ggplot2 graphics. If you look at the country variable by typing ilo_data$country in the R console, you get its values but also its levels, listed in the actual order.4. Reordering factors with the forcats package
Wrangling factors and factor levels sometimes can be a bit tedious. But there's a special package for making this easier, called forcats. It is part of the Tidyverse but needs to be loaded explicitly with a library call. It has a couple of useful functions, here's some of them.5. The fct_reorder function
With the fct_reorder function, we can reorder factors in a data-driven way, based on the ranks of another variable in the data set. Let's have a look at our current data set again. As you know, there are two rows per country, one for each year. If we want to reorder the countries based on the value of working hours we can decide whether to take the value for 1996 or the value for 2006 for reordering. With the fct_reorder function, though, we can also specify a custom summary function as the last argument. This function is applied to all values of the same factor level, that is, of the same country. If we specify the mean function as the third argument to the fct_reorder function, for instance, the mean weekly working hours of both years is computed for each country, and then the countries are reordered according to this mean value. As you can see, the levels are now different, the list starts with the Netherlands, which has the lowest mean weekly working hours.6. The fct_reorder function
Let's have a detailed look at this again. In the fct_reorder function, we first need to specify the factor variable to reorder, then another variable whose ranking dictates the order of levels, and then a summary function. This summary function is then applied to each value for every factor level. These values are given as vector arguments to the summary function, so for Austria, the mean of 31.99 and 31.82 is calculated, and so forth.7. Nudging labels with hjust and vjust
One problem remains: The working hour labels for both years still overlap with the arrows. In order to solve this we can make use of the hjust and vjust aesthetics for the geom_text geometry. If you want to nudge labels horizontally, you use the hjust aesthetic, which takes values from 0 to 1, but values outside this range are also possible. For our plot, I found 1.4 and -0.4 to be good values. Since hjust is an aesthetic, its value can be data-driven. Here, we use different values for the years 1996 and 2006, respectively. The ifelse function is perfect for that - if the year is 2006, use 1.4 as value, if not, use -0.4.8. Let's practice!
That was a lot. Once you try it out yourself it will be easier for you to grasp!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.