1. General principle: Memory allocation
Sadly,
2. No magic button
there is no magic solution that will make your R code run faster. If there was, someone would have implemented it for you. Instead there are number of pitfalls you want to avoid. It's not that it will make your code go faster, instead, it will avoid potentially fatal code. If we programmed in C,
3. If we programmed in C...
we would become aware of the importance of memory allocation, since that particular language holds us accountable for reclaiming memory. In R, this process happens automatically. When you assign a variable, R has to allocate memory in RAM. This takes some time, so an important way to make your code run faster is to minimize the amount of memory allocation R has to perform. Let's take a simple example.
4. Example: Sequence of integers
Suppose we want to create a sequence of integers again. Well, the obvious way is to use the colon operator.
But our colleague next door has been brought up on a diet of C, they want to use a for loop. We start by creating a vector of length n. Then use a loop to change the entries in the vector. The crucial part of this code is that the length of x doesn't change in the loop.
The final method is similar to method two; with one crucial exception. The object x starts empty and we gradually fill it up with integers. So what is the difference?
5. Benchmarking
Using the colon function, the operation happens so quickly, that even when n is ten million, it takes less than a millisecond to execute. For method two, that's where we pre-allocate the vector, it takes around two seconds. Certainly slower, but not too bad. Method three is a different story. When n is ten million we've gone from an operation that takes a couple of seconds, to an operation that takes over an hour!
Using method 3 we could easily transform code to something unusable. The reason for this slowdown is hidden within the for loop, is a request for more memory. When we extend the vector, we are really saying, "Please sir, can I have more memory". Since requesting memory is a relatively slow operation, this introduces a potentially fatal bottleneck.
6. Welcome to R club!
This is the first rule of R club: never, ever grow a vector.
7. Let's practice!