Session Ready
Exercise

Identifying the lines, take 1

The first thing you might notice when you look at your vector play_text is there are lots of empty lines. They don't really affect your task so you might want to remove them. The easiest way to find empty strings is to use the stringi function stri_isempty(), which returns a logical you can use to subset the not-empty strings:

# Get rid of empty strings
empty <- stri_isempty(play_text)
play_lines <- play_text[!empty]

So, how are you going to find the elements that indicate a character starts their line? Consider the following lines

> play_lines[10:15]
[1] "Algernon.  I'm sorry for that, for your sake.  I don't play"             
[2] "accurately--any one can play accurately--but I play with wonderful"      
[3] "expression.  As far as the piano is concerned, sentiment is my forte.  I"
[4] "keep science for Life."                                                  
[5] "Lane.  Yes, sir."                                                        
[6] "Algernon.  And, speaking of the science of Life, have you got the"

The first line is for Algernon, the next three strings are continuations of that line, then line 5 is for Lane and line 6 for Algernon.

How about looking for lines that start with a word followed by a .?

play_lines, containing the lines of the play as a character vector, has been pre-defined.

Instructions 1/3
undefined XP
  • 1
  • 2
  • 3
  • Build a pattern that matches the start of the line, followed by one or more word characters, then a period.
  • Use your pattern with str_view() to see the lines that matched, and those that didn't match. Do you see any problems?