Get startedGet started for free

Rolling operations

1. Rolling operations

Let's do rolling operations now.

2. Rolling means

Consider the rolling mean. The "rollmean1" function illustrates what it does. You want to create a vector whose value at position i is the mean of the n previous values in the input vector, you'll call n, the window. So for exemple, with a window of 2, the result at index i is the mean of x[i] and x[i-1]. The code shown is straightforward enough,

3. Rolling means

but its performance is bad for a variety of reasons. At each iteration of the loop: The code creates an integer vector to hold the indices, extract the relevant subset of data from x, then call the mean function.

4. Alternative algorithm

As an alternative to calculating the mean every time, you can think in terms of totals and then when you want the mean, you'll just divide by the window. With this shift in perspective, however, you realize that to move from one total to another, you only have to add the current value and remove one value that has gone outside of the window.

5. Alternative algorithm

This leads to the "rollmean2" function. This version operates on single values, and will serve as a basis to recoding it in C++.

6. Hackstucious (hack + astucious) vectorization

Before looking into doing it in C++, you might want to try to go a little bit further with R and express the algorithm with R's vectorised functions. Recall the previous algorithm, where at each iteration you would add a value and remove a value. The vector of totals is in fact the cumulative sum of the difference between the tail of the input vector and its head. That observation might lead to code that has better performance because it uses vectorized R functions, however this is a slippery slope towards hackstucious programming. In other words, this is code that is trying too much to be smart, at the expense of readability of the original intent.

7. Comparison

You can see here that the hackstucious code (which we wrapped inside "rollmean3") performs better. However the second version is much easier to read and understand. This is where C++ comes in, you will write a C++ version based on the second version that performs better than the third. C++ gives you the best of both worlds: best performance and better readability without any clever tricks.

8. Last observation carried forward

Suppose you have a vector that contains some missing values and you want to replace each of the missing values by the last value that was not missing in the vector.

9. Last observation carried forward

This is again a situation where the algorithm naturally translates to a loop, and where vectorization leads to hackstucious code. Let's just not go down that road, and directly translate iterative R code into a C++ loop.

10. Mean carried forward

Now let's see how you can carry forward the mean of all previous non missing values.

11. Mean carried forward

The "na_meancf1" function attempts to do it with various vectorization tricks and avoid calling the "mean" function many times, but it is again one of these cases where you don't quickly know what happens when reading the code.

12. Comparisons

The iterative version is easier to understand. It however performs badly because of R loops. This is again one of those cases where C++ will give you the best of both. You'll have readable code that does not require extensive vocabulary gymnastics, with good performance.

13. Let's practice!

Time to put this into practice.