1. Introduction to Functions
In this chapter we'll have a closer look at a very powerful concept in R
2. Functions
functions. Not surprisingly, you've already used functions before. Remember the time you created a list? You used the list() function. Or the time you wanted to display the contents of a variable? You used the print() function. But what are functions and how do they work?
3. Black box principle
You can think of a function as some kind of black box.
4. Black box principle
You give an input to the black box
5. Black box principle
the black box processes this input and it returns some output
6. Black box principle
Let's have a look at this black-box principle with a specific example.
7. Black box principle
The R sd function calculates the standard deviation of a vector. Our black box in this case is the sd function.
8. Black box principle
If you give the sd function a vector containing 1, 5, 6 and 7 as an input
9. Black box principle
the number 2 point 63, the standard deviation of these 4 values, will be your output.
10. Call function in R
How can you use the function sd in R? You already know how! Simply type sd followed by parentheses. Inside the parentheses, you specify the so called function arguments. These are the inputs to your functions. In our case, we have a single argument, the vector containing four values.
We could just as well assign the input vector to a variable, say the variable values, and then call sd on values.
In both cases, the value 2 point 63 gets printed to the console. That's because we did not assign the result of the function to a variable. If you want to reuse the result of the function, simply use the assignment operator as you did before so many times. Let's assign the output of our function to a variable my_sd.
If we now print my_sd to the console, we see that it contains 2 point 63.
11. Function documentation
Here I assumed that everybody knows how to use the sd() function. For the sd() you can guess that you have to input a vector, but there are many functions out there for which the usage is less straightforward. For information on what a function does and how it should be used, you can look up the documentation of the R function using the help function. For example, for the sd function, we type help(sd), or question mark sd. These are equivalent. If you are working in DataCamp, these commands will guide you to RDocumentation. If you're working locally, a documentation page will pop up. Both contain the same information.
Function documentation presents a lot of information. If we have a look at the "Usage" section, we see that the sd function actually takes two arguments, x and na (dot) rm. A strange thing here is that na (dot) rm is followed by an equals sign and FALSE, while x is not.
12. Questions
Well, this is a bummer. Asking for help on the sd function only gave us more questions.
First off, the first argument is called x, but we didn't specify it anywhere when calling sd on the values variable. How did R know what we meant?
Second, what's up with this = FALSE for the na (dot) rm argument?
And finally, how come sd(values) worked fine although sd seems to need two arguments?
Do not despair, all of these question will be solved in a moment!
13. Argument matching
When you call an R function, R has to match your input values to the function's arguments. To put it differently, R has to know that by values you mean the argument x of the sd() function. This is because R matched the values to the x argument by position. values is the first element inside the parentheses, so R knows that you mean the first argument of the sd() function, which is x.
However, it doesn't have to be this way. It would be perfectly equivalent to match the arguments by name, by specifically saying that we want the x argument to be values. We can do this by using the equal sign. The result is exactly the same.
14. na.rm argument
Now what's up with this na (dot) rm argument? The documentation shows that na (dot) rm is a logical value, indicating whether or not missing values should be removed. Let's experiment with this first, by adding an NA to the values vector and calling the sd() function once more with the values argument.
The result is simply NA, as the sd function did not remove the missing values before calculating the standard deviation. This is because by default, the na (dot) rm is FALSE, causing sd to not remove the missing values. That's exactly what the Usage section of sd's documentation tells us: na (dot) rm is FALSE indicates that by default NA's will not be removed. So if you do not specify the na (dot) rm argument, na (dot) rm will be set to FALSE.
For the case where the values vector contains a missing value, an NA, we'll want to set the na (dot) rm to TRUE. The sd function will then remove missing values before calculating the actual standard deviation. We can do this by letting R match the arguments by position. R knows that we want to set the x argument to values and the na (dot) rm argument to TRUE because of the order in which we set the function's input.
Matching by name is also possible. We explicitly say that the na (dot) rm argument must be TRUE.
Notice from this last expression that R knows how to handle a mix of matching by position and by name: the first argument was matched by position, while the second one was matched by name.
15. sd(values) works?
This also solves our third question: sd(values) does not throw any errors although we didn't define the na dot rm argument: R sees that we haven't specified it, so it takes the default value. However, If we had decided to leave the x argument unspecified, for example by simply calling sd() without arguments:
We will get an error: argument x is missing, with no default. Remember from the Usage section of the documentation that x did not have a default value, while na (dot) rm did. This tells us that function arguments for which no default is specified, have to be specified by the user of the function, otherwise an error is likely to occur.
16. Useful trick
Before wrapping up this introduction of functions, I want to point you to a very useful function, the args() function. This is a function to learn about the arguments of a function without having to read through the entire documentation. For the sd() function, we can use args(sd).
The output tells us that the first argument, x, has no default arguments, while na (dot) rm, the second argument, is FALSE by default.
17. Wrap-up
Functions may be a daunting concept at first, but knowing all about them is important to get a good understanding of R in general. R functions are used literally all the time. Let us recap on three key ideas.
First of all, functions work like a type of black box: you give some values as an input, the function processes this input and generates an output. Next, R matches function arguments by position or by name, and finally, some function arguments can have a default value, which can be overridden. If you do not specify the value of an argument that has no default, typically an error will occur.
18. Let's practice!