Get Started

Extracting substrings

The str_sub() function in stringr extracts parts of strings based on their location. As with all stringr functions, the first argument, string, is a vector of strings. The arguments start and end specify the boundaries of the piece to extract in characters.

For example, str_sub(x, 1, 4) asks for the substring starting at the first character, up to the fourth character, or in other words the first four characters. Try it with my Batman's name:

str_sub(c("Bruce", "Wayne"), 1, 4)

Both start and end can be negative integers, in which case, they count from the end of the string. For example, str_sub(x, -4, -1), asks for the substring starting at the fourth character from the end, up to the first character from the end, i.e. the last four characters. Again, try it with Batman:

str_sub(c("Bruce", "Wayne"), -4, -1)

To practice, you'll use str_sub() to look at popular first and last letters for names.

This is a part of the course

“String Manipulation with stringr in R”

View Course

Exercise instructions

We've set up the same boy_names and girl_names vectors from the last exercise in your workspace.

  • Use str_sub() to extract the first letter of each name in boy_names. Save this to boy_first_letter.
  • Use table() on boy_first_letter to count up how many names start with each letter. Can you see which is most popular?
  • Repeat these steps, but now look at the last letter for boys' names.
  • Again repeat, but now look at the first letter for girls' names.
  • Finally, look at the last letter for girls' names.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Extract first letter from boy_names
boy_first_letter <- ___

# Tabulate occurrences of boy_first_letter
___
  
# Extract the last letter in boy_names, then tabulate
boy_last_letter <- ___
___

# Extract the first letter in girl_names, then tabulate
girl_first_letter <- ___
___

# Extract the last letter in girl_names, then tabulate
girl_last_letter <- ___
___
  
Edit and Run Code