Exercise

Passing data as arguments

Here, we will explore subsetting data by passing it to workers as arguments. We will use the Jane Austen example introduced previously. The goal is to find unique words within the 6 books from the janeaustenr package, that start with the given letter, here "v", and have the given number of characters or more, here at least 10. You will evaluate the task in parallel on a cluster of size 2. You will also split the set of words into 2 subsets so that each worker gets one of them.

The parallel package, the set of words extracted from janeaustenr, and a cluster object cl with 2 workers are available in your workspace. The function with the following arguments has also been defined for you:

select_words(words, letter, min_length)

select_words() extracts all words that start with letter and are of length min_length or more. Run the function in the console to see how it works.

Instructions 1/3

undefined XP
    1
    2
    3

First try this sequentially.

  • Use select_words() to select words starting with "v" that are at least 10 letters long, assigning to words_v10.
  • Print the unique words in words_v10.