Get startedGet started for free

Don't change the size of Rcpp vectors

Rcpp vector classes are designed as very thin wrappers around R vectors. This has the performance implication that growing or shrinking them means creating a new vector of the right size and copy the relevant data. This is a very slow task, so it should be avoided.

If possible, you should structure your code to first calculate the final size of the vector, and then allocate it with that size.

Let's see an example that selects the positive values from a vector, equivalent to x[x > 0] in R. Since you don't know in advance how many positive numbers there will be in advance, it is tempting to start with a zero-length vector and append a value each time you find one. Here, push_back() is a function that appends a value.

NumericVector bad_select_positive_values_cpp(NumericVector x) {
  NumericVector positive_x(0);
  for(int i = 0; i < x.size(); i++) {
    if(x[i] > 0) {
      positive_x.push_back(x[i]);
    }
  }
  return positive_x;
}

Unfortunately, this function will be slow because it must repeatedly create new vectors and copy the data. See if you can do better!

This exercise is part of the course

Optimizing R Code with Rcpp

View Course

Exercise instructions

  • Complete the definition of a more efficient function, good_select_positive_values_cpp() to select the positive numbers.
    • In the first for loop, if the ith element of x is greater than zero, add one to n_positive_elements.
    • After that for loop, allocated a numeric vector, positive_x, of size n_positive_elements.
    • In the second for loop, again check if if the ith element of x is greater than zero.
    • When it is, set the jth element of positive_x to the ith element of x then add one to j.
  • bad_select_positive_values_cpp() is available in your workspace for comparison. Examine the console output to see the benchmarked difference in running time.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

#include 
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector good_select_positive_values_cpp(NumericVector x) {
  int n_elements = x.size();
  int n_positive_elements = 0;
  
  // Calculate the size of the output
  for(int i = 0; i < n_elements; i++) {
    // If the ith element of x is positive
    if(___) {
      // Add 1 to n_positive_elements
      ___;
    }
  }
  
  // Allocate a vector of size n_positive_elements
  ___;
  
  // Fill the vector
  int j = 0;
  for(int i = 0; i < n_elements; i++) {
    // If the ith element of x is positive
    if(___) {
      // Set the jth element of positive_x to the ith element of x
      ___;
      // Add 1 to j
      ___;
    }
  }
  return positive_x;
}

/*** R
set.seed(42)
x <- rnorm(1e4)
# Does it give the same answer as R?
all.equal(good_select_positive_values_cpp(x), x[x > 0])
# Which is faster?
microbenchmark(
  bad_cpp = bad_select_positive_values_cpp(x),
  good_cpp = good_select_positive_values_cpp(x)
)
*/
Edit and Run Code