The parallel package - parApply
1. The parallel package - parApply
2. The parallel package
The parallel package comes with R and enables us to write code that will work on multiple operating systems. It has parallel versions of standard functions. A simple operation to run in parallel is the3. The apply() function
apply function. Essentially, apply is a glorified for loop; we are applying a function to each row or column of a matrix. We could use a for loop, but it's just neater to use apply. Converting apply to run in parallel is easy. Suppose we have a matrix m, and we want to calculate the median value of every row. This is the perfect job for apply. The first argument is the data set we're working with. The second argument, is a number where 1 indicates rows, and 2 columns. The third argument is the function we want to apply. To convert this to run in parallel, we need a few additional lines of code.4. Converting to parallel
First we load the parallel package and then specify the number of cores to use. If I intend to use my machine for something else such as email, then I would specify the number of cores minus 1. So in this case I have 8 cores, so I'd specify seven. If the machine was purely focused on the computation, then I would set it to the maximum number of cores. Next we make a cluster object. This creates copies of R running in parallel. The argument in makeCluster specifies the number of cores to use. Next we swap apply with its parallel counterpart - parApply. Note the additional cluster argument in parApply. Finally, to free up resources, we close the cluster via the stopCluster function. What's really neat about this approach, is there is little extra code when switching to the parallel version. We simply: Load the package, Create a cluster, Change to parApply, Close the cluster. It's that easy.5. The bad news
Unfortunately, there is an additional execution overhead when running in parallel. When we use multiple cores, this requires communication between CPUs. If job is already very fast, then this communication can swamp the potential benefit. Therefore you should benchmark both the serial and parallel options.6. Let's practice!
Let's practiceCreate Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.