1. Case study
"The truth is rarely pure and never simple."
2. IMG
This is a
3. IMG
famous quote from the play. The Importance of Being Earnest, A Trivial Comedy for Serious People by Oscar Wilde. A friend has asked you to be in the play for a community fundraiser. Now, I know you don't like disappointing your friends, so you agree to be in the play. You do get one choice, you can pick your role. Who would you like to play? If you are like me, you'll probably want the character with the fewest lines, but perhaps you'd prefer the character with the most lines.
4. IMG
Your task is to read the play into R and using your new string processing skills to count how many lines each character gets to help you make your decision. We've downloaded the play off Project Gutenberg as a plain text file. The first step in your task is to read the play into R.
5. readLines()
readLines is a base R function that will read in a text file with no specific structure. readLines works a lot the other import functions in R, pass in the path to the file as the first argument and assign the result to an appropriate variable name. Here, I'll demonstrate with a little text file, that has a verse of Old MacDonald in it. The result is a vector of strings, each element is a line from the file. You can work with this vector of strings just like you have with all the others in this course: it can be the first argument to a stringr function, or you can subset it like you would any other vector. You'll see a couple of times in the exercises it's useful to know which element has a match, which is as easy as combining str_detect with which. You might also like to check out stri_read_lines in stringi, it's currently experimental, but is already much faster than readLines, making it useful for big text files. Alright,
6. Let's practice!
you are ready to get started, good luck! If you want an additional challenge, can you figure out which character speaks the famous line."The truth is rarely pure and never simple."?