1. How good is your machine?
If it's not too personal, how good is your machine? A few years ago, I was teaching a course on how to analyze micro-array data.
2. Experiments!
This type of data is around tens of thousand of rows long. Not really big, more mildly inconvenient. These types of experiments aren't cheap. Once you include experimental equipment, and researcher time, each experiment is easily a few thousand dollars. What struck me about teaching that course, was the number of very old and slow machines participants were using. A quick calculation showed that
3. To buy, or not to buy...
spending around one thousand dollars on a new computer would easily be recouped on researcher time.
4. To buy, or not to buy...
Suppose your analysis takes twenty minutes on your current machine but only ten minutes on a new machine and your hourly rate is around a hundred dollars, you could recoup the cost of new the computer very very quickly. It turns out that it's not obvious when you should upgrade, as there is always something better (and more expensive).
5. The benchmarkme package
The package benchmarkme aims to address this problem. The idea is simple. We both run the same piece of R code on our respective machines and compare the results.
After installing benchmarkme, via the usual install dot packages dance, we load the package. The main function within this package is benchmark_std. These benchmarks are standard R operations such as loops and matrix calculations. On a standard machine, this code will take around four minutes or so to run.
Once the benchmark has completed, you can compare your results to other users with the plot function. This method generates a set of plots that allows you to compare your machine to theirs. The plots on the right are for the programming benchmark. This particular benchmark focuses on looping. The top plot is measured in seconds, whereas the bottom plot is relative time compared to the fastest machine.
Each point on plot represents a machine. My computer is highlighted by the vertical line and I'm ranked 75th out of 385 machines. However relatively speaking, the fastest machine is less than twice as fast.
After you've examined the results, you should upload them to help other people via the upload_results function.
6. Let's practice!