Don't change the size of Rcpp vectors
Rcpp vector classes are designed as very thin wrappers around R vectors. This has the performance implication that growing or shrinking them means creating a new vector of the right size and copy the relevant data. This is a very slow task, so it should be avoided.
If possible, you should structure your code to first calculate the final size of the vector, and then allocate it with that size.
Let's see an example that selects the positive values from a vector, equivalent to x[x > 0]
in R. Since you don't know in advance how many positive numbers there will be in advance, it is tempting to start with a zero-length vector and append a value each time you find one. Here, push_back()
is a function that appends a value.
NumericVector bad_select_positive_values_cpp(NumericVector x) {
NumericVector positive_x(0);
for(int i = 0; i < x.size(); i++) {
if(x[i] > 0) {
positive_x.push_back(x[i]);
}
}
return positive_x;
}
Unfortunately, this function will be slow because it must repeatedly create new vectors and copy the data. See if you can do better!
This exercise is part of the course
Optimizing R Code with Rcpp
Exercise instructions
- Complete the definition of a more efficient function,
good_select_positive_values_cpp()
to select the positive numbers.- In the first
for
loop, if the ith element ofx
is greater than zero, add one ton_positive_elements
. - After that for loop, allocated a numeric vector,
positive_x
, of sizen_positive_elements
. - In the second
for
loop, again check if if the ith element ofx
is greater than zero. - When it is, set the jth element of
positive_x
to the ith element ofx
then add one toj
.
- In the first
bad_select_positive_values_cpp()
is available in your workspace for comparison. Examine the console output to see the benchmarked difference in running time.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
#include
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector good_select_positive_values_cpp(NumericVector x) {
int n_elements = x.size();
int n_positive_elements = 0;
// Calculate the size of the output
for(int i = 0; i < n_elements; i++) {
// If the ith element of x is positive
if(___) {
// Add 1 to n_positive_elements
___;
}
}
// Allocate a vector of size n_positive_elements
___;
// Fill the vector
int j = 0;
for(int i = 0; i < n_elements; i++) {
// If the ith element of x is positive
if(___) {
// Set the jth element of positive_x to the ith element of x
___;
// Add 1 to j
___;
}
}
return positive_x;
}
/*** R
set.seed(42)
x <- rnorm(1e4)
# Does it give the same answer as R?
all.equal(good_select_positive_values_cpp(x), x[x > 0])
# Which is faster?
microbenchmark(
bad_cpp = bad_select_positive_values_cpp(x),
good_cpp = good_select_positive_values_cpp(x)
)
*/