Parallelizing calls to chunk.apply
The chunk.apply()
function can also make use of parallel processes to process data more quickly. When the CH.PARALLEL
parameter is set to a value greater than one on Linux and Unix machine (including the Mac) multiple processes read and process data at the same time thereby reducing the execution time. On Windows the CH.PARALLEL
parameter is ignored.
Diese Übung ist Teil des Kurses
Scalable Data Processing in R
Anleitung zur Übung
- Benchmark the function
iotools_read_fun()
, first with 1 process and then with 3 parallel processes.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
iotools_read_fun <- function(parallel) {
fc <- file("mortgage-sample.csv", "rb")
readLines(fc, n = 1)
chunk.apply(fc, make_msa_table,
CH.MAX.SIZE = 1e5, CH.PARALLEL = parallel)
close(fc)
}
# Benchmark the new function
microbenchmark(
# Use one process
___,
# Use three processes
___,
times = 20
)