A parallel filter
Your work as a data consultant for the United Nations, and they want to survey arts students globally. They have sourced a dataset of universities with arts and humanities departments. They have decided to select the top arts universities in each country for the survey.
uni_list
is a list of data frames, each element is the data from a country. Each data frame contains a column total_score
. The following function is available for you:
filter_df <- function (df, select_n_unis) {
df %>%
top_n(select_n_unis, total_score)
}
This function requires dplyr
. The select_n_unis
argument defines the number of top universities to select. You have been asked to filter for the top five universities from each CSV file in parallel. The parallel
package has been loaded for you.
Diese Übung ist Teil des Kurses
Parallel Programming in R
Anleitung zur Übung
- Load
dplyr
on each core in the clustercl
. - Export the
n_unis
variable to the clustercl
. - Apply
filter_df()
to each element ofuni_list
usingparLapply()
. - Supply the number of universities to select,
n_unis
, to the correct argument offilter_df()
.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
cl <- makeCluster(4)
# Load dplyr in cluster
___
n_unis <- 5
# Export n_unis to cluster
___(___, ___, envir = environment())
# Apply filter_df() to each element of uni_list
ls_df <- parLapply(___, ___, ___,
# Supply number of universities to select
___ = ___)
stopCluster(cl)