Get startedGet started for free

Weighted mean

1. Weighted mean

If you've made it to this point, you have already acquired the most essential things you need to learn about Rcpp. Let's play.

2. Weighted mean of x with weights w

In the next few exercises, we'll have a look at implementing the weighted mean algorithm. Suppose we have two vectors of the same size x and w. The weighted mean of x with weights w is defined as the sum of x_i times w_i divided by the sum of w_i.

3. R version

You might already know the existence of the "weighted.mean" function from the stats package, but if you did not, that's alright. The expression easily translates to R code. The numerator is just the sum of the vectorized product of x and w, and the denominator is the sum of the elements of w. We'll see together how to write the function in C++ using for loops. Let's first have a look at what happens in the function.

4. R version

x times w is a vectorized operation to multiply each element of x by an element of w. This creates a new vector of the same size to hold some temporary result, let's call it "xw".

5. R version

This "xw" is then sent to the "sum()" function and this is divided by the sum of w. Vectorized multiplication operator and the "sum()" function have, in fact, loops that are implemented in C. This is why they have good performance.

6. R version

However, it uses 3 loops and instantiates an entire vector to hold the products of x and w. That vector is quickly discarded afterward. So even if this is the best we can do with regular R code, we can imagine that we can improve on that by using only one loop and allocating less data.

7. Inefficient R version

Before making a C++ version of the weighted mean function, let's turn it into a single loop in R code. If you have experience in R, you might already know that this will give you bad performance because R loops are to be avoided at all costs. But sometimes it is easier to play with the idea in R code-- devectorize it, and then translate it into C++. The translation is easier, then, because the R code to be translated already manipulates the vectors element by element.

8. Skeleton of a C++ version

You probably have guessed where this is headed. You now need a C++ function that takes two NumericVectors and returns a double. And that function will use a C++ loop to scan through the values of x and w at the same time. You will complete this function in the next exercise. Before you do that, let's talk about missing values.

9. Missing values

Just like in R, you cannot use the equality operator to test if a value is a missing value. In R, you use the "is.na()" function. In C++, each type of vector has its own missing value. Rcpp makes them available as static methods of the relevant vector class. Testing if a value is a missing value is done with the "is_na()" method. You will use this to remove missing values from the weighted mean. Similarly, you can get the actual value of N/A with the "get_na()" static method.

10. Let's practice!

Time to put this into practice.