How does processing time vary by data size?

If you are processing all elements of two data sets, and one data set is bigger, then the bigger data set will take longer to process. However, it's important to realize that how much longer it takes is not always directly proportional to how much bigger it is. That is, if you have two data sets and one is two times the size of the other, it is not guaranteed that the larger one will take twice as long to process. It could take 1.5 times longer or even four times longer. It depends on which operations are used to process the data set.

In this exercise, you'll use the microbenchmark package, which was covered in the Writing Efficient R Code course.

Note: Numbers are specified using scientific notation $$1e5 = 1 * 10^5 = 100,000$$

Este exercício faz parte do curso

Scalable Data Processing in R

Ver curso

Instruções do exercício

Load the microbenchmark package.
Use the microbenchmark() function to compare the sort times of random vectors.
Call plot() on mb.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Load the microbenchmark package
___

# Compare the timings for sorting different sizes of vector
mb <- ___(
  # Sort a random normal vector length 1e5
  "1e5" = sort(rnorm(1e5)),
  # Sort a random normal vector length 2.5e5
  "2.5e5" = sort(rnorm(2.5e5)),
  # Sort a random normal vector length 5e5
  "5e5" = sort(rnorm(5e5)),
  "7.5e5" = sort(rnorm(7.5e5)),
  "1e6" = sort(rnorm(1e6)),
  times = 10
)

# Plot the resulting benchmark object
___(mb)

Editar e executar o código