Get startedGet started for free

Identifying the lines, take 1

The first thing you might notice when you look at your vector play_text is there are lots of empty lines. They don't really affect your task so you might want to remove them. The easiest way to find empty strings is to use the stringi function stri_isempty(), which returns a logical you can use to subset the not-empty strings:

# Get rid of empty strings
empty <- stri_isempty(play_text)
play_lines <- play_text[!empty]

So, how are you going to find the elements that indicate a character starts their line? Consider the following lines

> play_lines[10:15]
[1] "Algernon.  I'm sorry for that, for your sake.  I don't play"             
[2] "accurately--any one can play accurately--but I play with wonderful"      
[3] "expression.  As far as the piano is concerned, sentiment is my forte.  I"
[4] "keep science for Life."                                                  
[5] "Lane.  Yes, sir."                                                        
[6] "Algernon.  And, speaking of the science of Life, have you got the"

The first line is for Algernon, the next three strings are continuations of that line, then line 5 is for Lane and line 6 for Algernon.

How about looking for lines that start with a word followed by a .?

play_lines, containing the lines of the play as a character vector, has been pre-defined.

This exercise is part of the course

String Manipulation with stringr in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Pattern for start, word then .
pattern_1 <- ___

# Test pattern_1
str_view(play_lines, ___, match = ___) 
str_view(play_lines, ___, match = ___)
Edit and Run Code