Extracting substrings
The str_sub()
function in stringr
extracts parts of strings based on their location. As with all stringr
functions, the first argument, string
, is a vector of strings. The arguments start
and end
specify the boundaries of the piece to extract in characters.
For example, str_sub(x, 1, 4)
asks for the substring starting at the first character, up to the fourth character, or in other words the first four characters. Try it with my Batman's name:
str_sub(c("Bruce", "Wayne"), 1, 4)
Both start
and end
can be negative integers, in which case, they count from the end of the string. For example, str_sub(x, -4, -1)
, asks for the substring starting at the fourth character from the end, up to the first character from the end, i.e. the last four characters. Again, try it with Batman:
str_sub(c("Bruce", "Wayne"), -4, -1)
To practice, you'll use str_sub()
to look at popular first and last letters for names.
This is a part of the course
“String Manipulation with stringr in R”
Exercise instructions
We've set up the same boy_names
and girl_names
vectors from the last exercise in your workspace.
- Use
str_sub()
to extract the first letter of each name inboy_names
. Save this toboy_first_letter
. - Use
table()
onboy_first_letter
to count up how many names start with each letter. Can you see which is most popular? - Repeat these steps, but now look at the last letter for boys' names.
- Again repeat, but now look at the first letter for girls' names.
- Finally, look at the last letter for girls' names.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract first letter from boy_names
boy_first_letter <- ___
# Tabulate occurrences of boy_first_letter
___
# Extract the last letter in boy_names, then tabulate
boy_last_letter <- ___
___
# Extract the first letter in girl_names, then tabulate
girl_first_letter <- ___
___
# Extract the last letter in girl_names, then tabulate
girl_last_letter <- ___
___