Get startedGet started for free

How does processing time vary by data size?

If you are processing all elements of two data sets, and one data set is bigger, then the bigger data set will take longer to process. However, it's important to realize that how much longer it takes is not always directly proportional to how much bigger it is. That is, if you have two data sets and one is two times the size of the other, it is not guaranteed that the larger one will take twice as long to process. It could take 1.5 times longer or even four times longer. It depends on which operations are used to process the data set.

In this exercise, you'll use the microbenchmark package, which was covered in the Writing Efficient R Code course.

Note: Numbers are specified using scientific notation $$1e5 = 1 * 10^5 = 100,000$$

This exercise is part of the course

Scalable Data Processing in R

View Course

Exercise instructions

  • Load the microbenchmark package.
  • Use the microbenchmark() function to compare the sort times of random vectors.
  • Call plot() on mb.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the microbenchmark package
___

# Compare the timings for sorting different sizes of vector
mb <- ___(
  # Sort a random normal vector length 1e5
  "1e5" = sort(rnorm(1e5)),
  # Sort a random normal vector length 2.5e5
  "2.5e5" = sort(rnorm(2.5e5)),
  # Sort a random normal vector length 5e5
  "5e5" = sort(rnorm(5e5)),
  "7.5e5" = sort(rnorm(7.5e5)),
  "1e6" = sort(rnorm(1e6)),
  times = 10
)

# Plot the resulting benchmark object
___(mb)
Edit and Run Code