Get startedGet started for free

Matching a single grapheme

A related problem is matching a single character. You've used ANY_CHAR to do this up until now, but it will only match a character represented by a single code point. Take these three names:

x <- c("Adele", "Ad\u00e8le", "Ad\u0065\u0300le")
writeLines(x)

They look the similar, but this regular expression only matches two of them:

str_view(x, "Ad" %R% ANY_CHAR %R% "le")

because in the third name è is represented by two code points. The Unicode standard has a concept of a grapheme that represents a display character, but may be composed of many code points. To match any grapheme you can use GRAPHEME.

str_view(x, "Ad" %R% GRAPHEME %R% "le")

Names of rulers from the Vietnamese Tây Sơn dynasty, with diacritics given as separate graphemes, is pre-defined as tay_son_separate.

This exercise is part of the course

String Manipulation with stringr in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# tay_son_separate has been pre-defined
tay_son_separate

# View all the characters in tay_son_separate
___
Edit and Run Code