Session Ready
Exercise

Word frequency with doParallel

So far you learned how to search for the most frequent word in a text sequentially using foreach(). In the course of the next two exercises, you will implement the same task using doParallel and doFuture for parallel processing and benchmark it against the sequential version. The sequential solution is implemented in function freq_seq() (type freq_seq in your console to see it). It iterates over a global character vector chars and calls the function max_frequency() which searches within a vector of words, while filtering for minimum word length. All these objects are preloaded, as is the doParallel package. Your job now is to write a function freq_doPar() that runs the same code in parallel via doParallel.

Instructions
100 XP
  • Define function freq_doPar() with arguments cores and min_length = 5.
  • The function registers a cluster of cores nodes using registerDoParallel() (explicitly specify the argument name).
  • Write a foreach() loop similar to freq_seq() but runs in parallel.
    • The loop should export functions max_frequency() and select_words() and the object words (in that order) to the workers using the .export argument.
    • The loop should load packages janeaustenr and stringr (in that order) on workers using .packages.
  • Run the function freq_doPar() on 2 cores.