Matching a single grapheme

A related problem is matching a single character. You've used ANY_CHAR to do this up until now, but it will only match a character represented by a single code point. Take these three names:

x <- c("Adele", "Ad\u00e8le", "Ad\u0065\u0300le")
writeLines(x)

They look the similar, but this regular expression only matches two of them:

str_view(x, "Ad" %R% ANY_CHAR %R% "le")

because in the third name è is represented by two code points. The Unicode standard has a concept of a grapheme that represents a display character, but may be composed of many code points. To match any grapheme you can use GRAPHEME.

str_view(x, "Ad" %R% GRAPHEME %R% "le")

Names of rulers from the Vietnamese Tây Sơn dynasty, with diacritics given as separate graphemes, is pre-defined as tay_son_separate.

Este exercício faz parte do curso

String Manipulation with stringr in R

Ver curso

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# tay_son_separate has been pre-defined
tay_son_separate

# View all the characters in tay_son_separate
___

Editar e executar o código