1. Learn
  2. /
  3. Courses
  4. /
  5. String Manipulation with stringr in R

Connected

Exercise

Using backreferences in patterns

Backreferences can be useful in matching because they allow you to find repeated patterns or words. Using a backreference requires two things: you need to capture() the part of the pattern you want to reference, and then you refer to it with REF1.

Take a look at this pattern: capture(LOWER) %R% REF1. It matches and captures any lower case character, then is followed by the captured character: it detects repeated characters regardless of what character is repeated. To see it in action try this:

str_view(c("hello", "sweet", "kitten"), 
  pattern = capture(LOWER) %R% REF1)

If you capture more than one thing you can refer to them with REF2, REF3 etc. up to REF9, counting the captures from the left of the pattern.

Let's practice with boy_names again. You might notice a change in this dataset. We've converted all names to lower case; you'll learn how to do that in the next chapter.

Instructions 1/4

undefined XP
  • 1

    In each case, assign the pattern argument, then view the matches by running the str_view() code.

    See all the boy_names with a letter repeated three times, by extending the pattern in the text above with another REF1. Assign the pattern to repeated_three_times.

  • 2

    See all the boy_names with a pair of letters repeated twice, e.g. abab, by capturing two lower case characters, then referring to the capture with REF1. Assign the pattern to pair_of_repeated.

  • 3

    See all the boy_names with a pair of letter followed by their reverse, e.g. abba, by capturing two lower case characters separately and combining with REF2 and REF1. Assign the pattern to pair_that_reverses.

  • 4

    See all the boy_names that are a four letter palindrome (a name that reads the same forwards and backwards) by wrappping the previous pattern in exactly(). Assign the pattern to four_letter_palindrome.