Some simple text statistics
Generally, specifying simplify = TRUE
will give you output that is easier to work with, but you'll always get n
pieces (even if some are empty, ""
).
Sometimes, you want to know how many pieces a string can be split into, or you want to do something with every piece before moving to a simpler structure. This is a situation where you don't want to simplify and you'll have to process the output with something like lapply()
.
As an example, you'll be performing some simple text statistics on your lines from Alice's Adventures in Wonderland from Chapter 1. Your goal will be to calculate how many words are in each line, and the average length of words in each line.
To do these calculations, you'll need to split the lines into words. One way to break a sentence into words is to split on an empty space " "
. This is a little naive because, for example, it wouldn't pick up words separated by a newline escape sequence like in "two\nwords"
, but since this situation doesn't occur in your lines, it will do.
This exercise is part of the course
String Manipulation with stringr in R
Exercise instructions
We've put lines
a vector with three strings, each corresponding to a line in your workspace.
- Split
lines
into words. Assign the resulting list towords
. - Use
lapply()
to applylength()
to each element inwords
to count the number of words in each line. - Use
lapply()
to applystr_length()
to each element inwords
, to count the number of characters in each word. Assign this toword_lengths
. - Use
lapply()
to applymean()
to each element inword_lengths
, to find the average word length in each line.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split lines into words
words <- ___
# Number of words per line
___
# Number of characters in each word
word_lengths <- ___
# Average word length per line
___