Summarizing data

Let's now make a faceted plot to compare usefulness across different learning platforms.

In this exercise, we'll introduce a new dplyrfunction, add_count(). add_count() adds a column to the dataset, n, keeping the same number of rows as the original dataset. Just like count(), n defaults to be the number of rows for each group, but you can change that with the wt (weight) argument. You set wt equal to another column to make n now equal to the sum of that column for each group.

Let's say you wanted to add a column to iris that is the sum of the Petal.Length for all the flowers of the same Species. You would write:

iris %>%
   add_count(Species, wt = Petal.Length) %>%
   select(Species, Petal.Length, n)

This would give you back:

# A tibble: 150 x 3
   Species Petal.Length     n
   <fct>          <dbl> <dbl>
 1 setosa           1.4  73.1
 2 setosa           1.4  73.1
 3 virginica        6.4  278.

Deze oefening maakt deel uit van de cursus

Categorical Data in the Tidyverse

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

learning_platform_usefulness %>%
  # Change dataset to one row per learning_platform usefulness pair with number of entries for each
  ___(___, ___)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Categorical Data in the Tidyverse

SkillTag.level.beginnerSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you’ll learn all about factors. You’ll discover the difference between categorical and ordinal variables, how R represents them, and how to inspect them to find the number and names of the levels. Finally, you’ll find how forcats, a tidyverse package, can improve your plots by letting you quickly reorder variables by their frequency.

Exercise 1: Introduction to qualitative variables Exercise 2: Recognizing factor variables Exercise 3: Qualitative variables in theory Exercise 4: Understanding your qualitative variables Exercise 5: Getting number of levels Exercise 6: Examining number of levels Exercise 7: Examining levels Exercise 8: Making better plots Exercise 9: Reordering a variable by its frequency Exercise 10: Ordering one variable by another

You’ll continue to dive into the forcats package, learning how to change the order and names of levels and even collapse them into one another.

Exercise 1: Reordering factors Exercise 2: Changing the order of factor levels Exercise 3: Tricks of fct_relevel()Exercise 4: Renaming factor levels Exercise 5: Distinguishing between forcats functions Exercise 6: Renaming a few levels Exercise 7: When you have a typo Exercise 8: Collapsing factor levels Exercise 9: Manually collapsing levels Exercise 10: Lumping variables by proportion Exercise 11: Preserving the most common levels

Having gotten a good grasp of forcats, you’ll expand out to the rest of the tidyverse, learning and reviewing functions from dplyr, tidyr, and stringr. You’ll refine graphs with ggplot2 by changing axes to percentage scales, editing the layout of the text, and more.

Exercise 1: Examining common themed variables Exercise 2: Grouping and reshaping similar columns Exercise 3: Summarizing data

Huidige oefening

Exercise 4: Creating an initial plot Exercise 5: Tricks of ggplot2 Exercise 6: Editing plot text Exercise 7: Reordering graphs Exercise 8: Changing and creating variables with case_when()Exercise 9: case_when() with single variable Exercise 10: case_when() from multiple columns

In this final chapter, you’ll take all that you’ve learned and apply it in a case study. You’ll learn more about working with strings and summarizing data, then replicate a publication quality 538 plot.

Exercise 1: Case study introduction Exercise 2: Changing characters to factors Exercise 3: Tidying data Exercise 4: Data preparation and regex Exercise 5: Cleaning up strings Exercise 6: Dichotomizing variables Exercise 7: Summarizing data Exercise 8: Recreating the plot Exercise 9: Creating an initial plot Exercise 10: Fixing labels Exercise 11: Flipping things around Exercise 12: Finalizing the chart Exercise 13: End of course recap