Identifying the lines, take 1
The first thing you might notice when you look at your vector play_text
is there are lots of empty lines. They don't really affect your task so you might want to remove them. The easiest way to find empty strings is to use the stringi
function stri_isempty()
, which returns a logical you can use to subset the not-empty strings:
# Get rid of empty strings
empty <- stri_isempty(play_text)
play_lines <- play_text[!empty]
So, how are you going to find the elements that indicate a character starts their line? Consider the following lines
> play_lines[10:15]
[1] "Algernon. I'm sorry for that, for your sake. I don't play"
[2] "accurately--any one can play accurately--but I play with wonderful"
[3] "expression. As far as the piano is concerned, sentiment is my forte. I"
[4] "keep science for Life."
[5] "Lane. Yes, sir."
[6] "Algernon. And, speaking of the science of Life, have you got the"
The first line is for Algernon
, the next three strings are continuations of that line, then line 5 is for Lane
and line 6
for Algernon
.
How about looking for lines that start with a word followed by a .
?
play_lines
, containing the lines of the play as a character vector, has been pre-defined.
This exercise is part of the course
String Manipulation with stringr in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Pattern for start, word then .
pattern_1 <- ___
# Test pattern_1
str_view(play_lines, ___, match = ___)
str_view(play_lines, ___, match = ___)