Benchmarking
1. My code is slow!
Every R programmer at one point or another has uttered the phrase, my code is slow! This is usually followed with tears and curses and not necessarily in that order2. Is my code really slow?
But what do you mean by slow? Is one second slow? What about a minute? or an hour? This is obviously problem dependent. What you need is code that is fast enough! To determine if it is worth changing your code, you need to compare your existing solution3. Is my code really slow?
with one or more alternatives. This is what we mean by benchmarking. The concept is straightforward. You simply time how long each solution takes, and all things being equal, select the fastest.4. Benchmarking
Benchmarking is a two step process. First, you construct a function around the feature you wish to benchmark. Typically the function has an argument that enables you to vary the complexity of the task. For example a parameter that alters the data size. Second, you time the function under different scenarios. Let's have an example. Suppose you want to generate5. Example: Sequence of numbers
a sequence of integers. There are three obvious ways to do this. The first is to use the colon. The second, the sequence function with the default increment step size. The third, the sequence function where we explicitly specify the step size.6. Function wrapping
We begin by wrapping the options in functions and allow the sequence length n, to be passed as an argument. Next, to determine how long the function takes to run,7. Timing with system.time()
we wrap the function call with system-dot-time. Running this code produces three numbers: user, system and elapsed time. Roughly the user time is the CPU time charged for the execution of user instructions. The system time is the CPU time charged for execution by the system on behalf of the calling process. The elapsed time is approximately the sum of user and elapsed; this is the number we typically care about. So in this example, it took 0-point-06 seconds for the colon function but 1-point-6 seconds for the sequence function. I often use system dot time during an analysis. I set my code running8. Storing the result
as I leave the office, and want to know how long the job took when I return the next morning. However, I also want to use the result! In this case we use the arrow operator. Using the arrow within a function call performs two tasks: argument passing and object assignment. This allows us to both to time and store the operation. The equals operator only performs argument passing or assignment. So using equals inside system dot time will raise an error. As well as considering elapsed time. It's worthwhile calculating the relative time.9. Relative time
This is simply a ratio. So in this example, the elapsed times are 0-point-06 and 1-point-6 seconds. The relative time is 26. That is the seq by function is 26 times slower than using the colon function.10. Microbenchmark package
The microbenchmark package is a wrapper around system dot time and makes it straightforward when comparing multiple functions. The key function in this package is the unimaginatively named microbenchmark. In this code, we are comparing functions colon, seq_default and seq_by. The times argument specifies how many times we should call each function. As a bonus, the C-L-D column provides a statistical ranking of functions. As you would expect, the colon operator is the fastest function for generating a sequence of integers and takes on average 220 milliseconds.11. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.