1. vapply
That was some pretty advanced stuff you did there!
2. Recap
Before you head over to the final topic of this chapter, let's do a quick recap. First, you learned about lapply. This function allows you to avoid the for loop altogether and apply a function on every element of a list or a vector. The output list has the same length as the input list. The lapply function always returns a list, but there are many cases in which this list can be simplified to an array. That's why R provides the sapply function, short for simplify apply. Whenever possible, sapply tries to convert the list that lapply generates to an array. If this is not possible, however, sapply simply returns the same list that lapply generates. This can be quite dangerous, because the behavior of sapply's output depends on the specifics of the data we're using.
This short overview leads us seamlessly to the vapply function. vapply is quite similar to sapply. Under the hood, it uses lapply and then tries to simplify the result. However, when using vapply, you have to explicitly say what the type of the return value will be. In sapply, this is not required nor possible.
3. sapply() & vapply()
Let's take a look at a case where sapply and vapply act quite similarly, and then check an example where the power of vapply is more clear. We'll be using the cities example from before. We had a vector of city names, cities, over which we apply different functions. For example, calling the nchar function with sapply gives us a vector with the length of each character string. How can we write this using vapply? Well, when we check the documentation of the vapply function, we can see it can be used as follows.
X, FUN and USE (dot) NAMES are arguments that you already know from the sapply() function, but the FUN (dot) VALUE argument is new here. This argument should be a general template for the return value of FUN, the function that you want to apply over the input X.
In our example, we want to apply the nchar function over cities. nchar is a function that returns a single number, which is a numeric vector of length 1. We can template this output using the numeric() function, by setting FUN dot VALUE to numeric(1), which tells the vapply function that nchar() should return a single numerical value.
The result is exactly the same as the sapply function from before. However, this 'pre-specification' of FUN's return value makes vapply a safer alternative to sapply.
4. vapply()
To understand this, let's re-use another example from our discussion of sapply, where we extracted the first and last letters of the cities' names.
Here, sapply works like a charm again.
5. vapply()
To write this using vapply, we'll need to set FUN (dot) NAMES again. This time, the FUN we want to apply, first_and_last returns a character vector of length two, which can be expressed as character(2). Works great!
6. vapply() errors
But let's see what happens if we told vapply() that we expect first_and_last to return a character vector of length 1. This generates an error. the output of the first_and_last function is not expected, so R complains.
7. vapply() errors
A similar error pops up if we tell vapply() that the output of first_and_last will be a numerical vector of length 2.
8. unique_letters()
This little bit of extra work in defining the FUN dot VALUES arguments has the benefit that you really have to think about what your function will return without blindly assuming that the sapply function will handle every case for you!
Let's have a look at a final example. Remember the function we wrote to get the unique letters in a string? Here it is again.
9. vapply() > sapply()
We can call this unique_letters() function and apply it over the cities vector using sapply. At this point, we could have incorrectly assumed that sapply would be successful at simplifying the result to a vector, but this is not the case because the unique_letters function returns vectors of different sizes. If we try to do something similar with vapply, we have to specify the FUN (dot) VALUE argument. Let's assume that unique_letters() always returns a vector of 4 character strings:
As before, we get an error, because the unique_letters() function doesn't always return a vector of character strings of length 4. This stresses our main point: vapply() is safer than sapply() if you want to simplify the result that lapply() generates.
10. Let's practice!
Wow, that's a lot of applying that you can do in R now! In fact, there's even more. The apply, tapply, mapply and rapply functions exist as well. You'll learn more about these in the advanced R course. Enough theory for today. Head over to the final set of exercises for this chapter to ramp up your skills!