1. Should we parallelize?
Hi! My name is Nabeel. Welcome to this course. Before we dive into parallelization let's ask ourselves: should we parallelize?
2. Let's construct a building
To answer this question, let's look at an example. Consider the construction of a building. We can only build floor two after floor one. This is a sequential problem, as we have to go in sequence.
Now consider installing windows to the finished concrete structure. This is a parallel task. We can install any number of windows at any given time, and the outcomes don't interfere with each other.
We can apply the same principle to computations.
3. The sequential-parallel scale
We could visualize a sequential-parallel scale, and place on it various computational tasks.
Tasks towards the right are easily parallelized. These include reading data, creating new variables, etc. If we take the square root of all numbers in a list, each output is independent of others.
Tasks on the left side of the scale are harder to parallelize. We need to know the sum of all preceding numbers to compute the cumulative sum up to some number.
4. A classic numerical operation
Let's take the example of square roots.
We take numbers from one to a million. The lapply function takes a list of inputs and applies a function to all of them. Here we apply square root onto the numbers. We note the time before and after execution using Sys-dot-time. So how much time does this take? About a second.
Now, what would this computation look like in parallel?
5. How could we parallelize the square root?
We can split the numbers into five groups.
6. How could we parallelize the square root?
We send each group to an available "worker".
These workers are also referred to as cores. These are subunits of our CPU that can perform computations independently. Most modern computers have multiple cores, we have arbitrarily chosen three.
Collectively, the cores are called a cluster.
Please note that the last two groups of numbers cannot be assigned to a core, because all cores are busy. These groups will wait until one of the cores has completed its current job.
7. How could we parallelize the square root?
And finally we collect the results from each core.
8. A parallelized numerical operation
How can we program this?
We load the parallel package and make a cluster of three cores.
We use the parallel version of lapply, helpfully called parLapply. This function uses our cluster to map square root to the million numbers. The splitting and combining is handled by parLapply behind the scenes.
Once done, we stop the cluster.
We note the time difference and it has reduced by about 20%. Not bad, but not exactly three times faster. To explain why, we need to go back to our parallel flow chart.
9. Not as fast as we expected
The gain in execution speed is countered by extra computations.
10. Not as fast as we expected
parLapply has to split the data.
11. Not as fast as we expected
Copy a subgroup to available cores.
12. Not as fast as we expected
And combine all the results.
13. Not as fast as we expected
Some computational resources are also spent orchestrating this whole process. This is why we usually do not use all available cores. The square root itself is so quick that parallel execution only results in a moderate speed boost.
14. So, should we parallelize?
We can now answer the question we posed at the beginning of this video: should we parallelize? Well, it's a balancing act.
Given a sufficiently complex task, parallel code will be faster. It will be cost efficient because it will harness all the hardware.
Though, we will need parallel programming skills. But hey, that's not a problem for us!
Extra RAM or memory is required when splitting because multiple copies of the data are made.
During this course we will address these cons to get the best outcomes.
15. Let's practice!
For now, let's practice the concepts we learned.