1. Learn
  2. /
  3. Courses
  4. /
  5. Parallel Programming in R

Exercise

A parallel filter

Your work as a data consultant for the United Nations, and they want to survey arts students globally. They have sourced a dataset of universities with arts and humanities departments. They have decided to select the top arts universities in each country for the survey.

uni_list is a list of data frames, each element is the data from a country. Each data frame contains a column total_score. The following function is available for you:

filter_df <- function (df, select_n_unis) {
  df %>% 
    top_n(select_n_unis, total_score)
}

This function requires dplyr. The select_n_unis argument defines the number of top universities to select. You have been asked to filter for the top five universities from each CSV file in parallel. The parallel package has been loaded for you.

Instructions

100 XP
  • Load dplyr on each core in the cluster cl.
  • Export the n_unis variable to the cluster cl.
  • Apply filter_df() to each element of uni_list using parLapply().
  • Supply the number of universities to select, n_unis, to the correct argument of filter_df().