Cleaning up strings

In this lesson, we learned some basics of "regex," or regular expressions, which allow us to capture general patterns. We've covered two notations:

Expression	Does this
`.`	matches any character
`*`	zero or more times

For example, ".*science " would match "data science " in the string "data science rocks!"

Let's use what we've learned to change the response_var in the dataset you created in the previous lesson, gathered_data.

Diese Übung ist Teil des Kurses

Categorical Data in the Tidyverse

Anleitung zur Übung

Use str_remove to remove everything before and including "rude to " (with the space at the end) in the response_var column.
Use str_remove to remove "on a plane" from the response_var column.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

gathered_data %>%
    # Remove everything before and including "rude to " (with that space at the end!)
    mutate(response_var = ___(response_var, ___)) %>%
    # Remove "on a plane"
    mutate(response_var = ___(response_var, ___))

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

Categorical Data in the Tidyverse

Geringe SchwierigkeitSchwierigkeitsgrad

4.8+

Kurs kostenlos starten

In this chapter, you’ll learn all about factors. You’ll discover the difference between categorical and ordinal variables, how R represents them, and how to inspect them to find the number and names of the levels. Finally, you’ll find how forcats, a tidyverse package, can improve your plots by letting you quickly reorder variables by their frequency.

Exercise 1: Introduction to qualitative variables Exercise 2: Recognizing factor variables Exercise 3: Qualitative variables in theory Exercise 4: Understanding your qualitative variables Exercise 5: Getting number of levels Exercise 6: Examining number of levels Exercise 7: Examining levels Exercise 8: Making better plots Exercise 9: Reordering a variable by its frequency Exercise 10: Ordering one variable by another

You’ll continue to dive into the forcats package, learning how to change the order and names of levels and even collapse them into one another.

Exercise 1: Reordering factors Exercise 2: Changing the order of factor levels Exercise 3: Tricks of fct_relevel()Exercise 4: Renaming factor levels Exercise 5: Distinguishing between forcats functions Exercise 6: Renaming a few levels Exercise 7: When you have a typo Exercise 8: Collapsing factor levels Exercise 9: Manually collapsing levels Exercise 10: Lumping variables by proportion Exercise 11: Preserving the most common levels

Having gotten a good grasp of forcats, you’ll expand out to the rest of the tidyverse, learning and reviewing functions from dplyr, tidyr, and stringr. You’ll refine graphs with ggplot2 by changing axes to percentage scales, editing the layout of the text, and more.

Exercise 1: Examining common themed variables Exercise 2: Grouping and reshaping similar columns Exercise 3: Summarizing data Exercise 4: Creating an initial plot Exercise 5: Tricks of ggplot2 Exercise 6: Editing plot text Exercise 7: Reordering graphs Exercise 8: Changing and creating variables with case_when()Exercise 9: case_when() with single variable Exercise 10: case_when() from multiple columns

In this final chapter, you’ll take all that you’ve learned and apply it in a case study. You’ll learn more about working with strings and summarizing data, then replicate a publication quality 538 plot.

Exercise 1: Case study introduction Exercise 2: Changing characters to factors Exercise 3: Tidying data Exercise 4: Data preparation and regex Exercise 5: Cleaning up strings

Aktuelle Übung

Exercise 6: Dichotomizing variables Exercise 7: Summarizing data Exercise 8: Recreating the plot Exercise 9: Creating an initial plot Exercise 10: Fixing labels Exercise 11: Flipping things around Exercise 12: Finalizing the chart Exercise 13: End of course recap