CommencerCommencer gratuitement

Iterate and filter

As a data consultant, you receive data in all shapes and formats. A client has just transferred a 40-gigabyte CSV file containing data on higher studies institutions across the world.

The junior analysts cannot parse the data correctly due to lack of memory, and there is no budget for the compute resources required to handle the whole dataset. Your current analysis only requires analysis of institutes in Australia. The file path of the CSV is already stored in the variable filepath. You have been asked to implement an efficient reader for this data. parallel, doParallel, foreach, and iterators packages have been loaded for you.

Cet exercice fait partie du cours

Parallel Programming in R

Afficher le cours

Instructions

  • Use an iterator to read lines from filepath.
  • Combine the results using an appropriate function to create a data frame.
  • Specify the parallel operator.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

cl <- makeCluster(4)
registerDoParallel(cl)
# Use an iterator to read lines
foreach(l = ___,
        # Combine the lines as rows
        ___ = ___
        # Use the parallel operator
        ) ___ {
  line <- strsplit(l, ",")[[1]]
  if (line[4] == "Australia") {
    return(line)
  }
}

stopCluster(cl)
Modifier et exécuter le code