1. Measuring the benefits
In this lesson we will evaluate the performance of parallelized code and look at some alternatives.
2. Toy example
To start, let's go back to our toy example of square roots. Here we have the sequential and parallel versions of the same code. Which version is better?
3. Benchmarking performance
Benchmarking can help us answer this question. Benchmarking runs a code snippet or expression several times and gives us an average execution time.
We will load the microbenchmark package. We supply both the sequential and the parallel versions to the microbenchmark function. Notice the curly braces around the multi-line expression for the parallel version.
And finally, we specify that we want to run each expression ten times. We see that on average, the parallel version takes about 50% more time! Let's recall that this is because square root itself is a very simple operation, and the overheads of parallelization weigh it down.
Also notice that this is different from profiling: profiling does a line-by-line breakdown, and benchmarking executes the whole expression several times to give us an average.
4. The elephant in the room
We may be wondering why we haven't calculated the square roots as shown; it is an elegant one-liner and the standard R method.
5. Vectorization
And it does the job fast. If we benchmark, it is more than a 100 times faster than our parallel code!
This is because sqrt() is vectorized like many other base R functions. Vectorization is the process of applying a single function to many inputs. sqrt() takes multiple inputs and outputs a square root of all of them.
One function, many inputs? This should sound familiar to us. Vectorization is actually a type of fast low-level parallelism. It only works for simple operations, or single instructions. There are many tasks that cannot be vectorized easily.
6. The bootstrap
A bootstrap is a common example of tasks that cannot be vectorized. Bootstrapping entails repeated sampling of the data with replacement to create multiple "versions" of the data. Let's look at an example to understand this better.
We have this list of data frames. Each data frame contains global life expectancy data, from year 2001 to 2020.
7. Classic version
Let's focus on just the data from 2001 for now. We first create an empty vector "estimates" to store the results. Now we repeatedly sample the data with replacement, a total of ten thousand times, and calculate the mean for each of these samples.
If we plot a histogram of the estimates variable, we will see that we have a distribution for the average global life expectancy. Not only can we calculate an overall average (the solid line), but also a confidence interval (the dashed lines), among other statistics.
8. The good news
Since we are repeatedly sampling and creating new "versions" of the data, and performing calculations on it, this can be parallelized.
Here is the loop we wrote for bootstrapping, as applied to a single data frame. We wrap it into a function and apply to our list of data frames in parallel.
9. The benefits
We can benchmark our parallelized bootstrap against the sequential version. We supply both versions to the microbenchmark function, and run each ten times. We've reduced the average execution time by about 40%, not too bad!
Hopefully, by now, a workflow is emerging from all of this. For any piece of code that we want to optimize, we first profile the code to find the slowest parts. We then optimize this part by parallelizing, or vectorizing. And finally, we compare the existing code with the new version using benchmarking.
10. Let's practice!
Now let's practice these concepts in the exercises.