1. The importance of vectorizing your code
When we call a base R function,
2. General rule
we eventually call some C or FORTRAN code. The underlying code is heavily optimized. A general rule for making code run faster is to access the underlying code as quickly as possible; the fewer functions call the better. This usually means vectorized code.
3. Vectorized functions
Many R functions are vectorized. Some functions, such as rnorm take in a single number, but return a vector. Other functions, such as mean, take in a vector and return a single value.
4. Generating random numbers
Now consider this for loop. It generates a million random numbers from the standard normal distribution. You've paid attention to the previous videos and preallocated a vector to store the results. However the vectorized version of rnorm is still around forty times faster.
The interesting question is why? Depending on the circles you mix in,
5. Why is the loop slow?
you may have heard the expression "for loops in R are slow". So let's take a closer look at this loop and try to figure out what's going on
Allocation. This is very quick operation with a one-off cost. This type of operation also occurs in the vectorized version.
Generating random numbers: In the for loop, we have one million calls to the rnorm function. In the vectorized solution there is a single call.
Finally, assignment: again we have a million calls to the assignment method. But in the vectorized solution we have a single assignment operation.
It's not that the for loop is slow compared to the vectorized solution. It's that there over two million more function calls! This is the second rule of R club:
6. R club
use a vectorized solution wherever possible.
7. Let's practice!