1. Why you should use functions
Hi, I'm Richie from DataCamp. You've already used many of R's functions.
2. The arguments to mean()
If you've taken any other DataCamp courses, I'm sure you had to call mean in some of the exercises.
To recap, it has three arguments. x is a vector of numbers or times. trim is the proportion of highest and lowest values to remove before calculation. na-dot-rm is a logical value determining whether missing values should be removed.
3. Calling mean()
When you call mean (or any other function), there are several ways you can pass arguments to it.
Firstly, you can pass arguments by position. Here, R knows that numbers is the x argument, zero-point-one is the trim argument, and TRUE is the na-dot-rm argument. One problem is that it can be hard to remember what each variable is.
Secondly, you can pass arguments by name. In this case the order doesn't matter, though it's harder to read when the arguments are ordered unconventionally.
Most code style guides suggest a combination approach: pass the arguments in the same order as the documents specify them, and provide names for rarer arguments but not common ones.
4. Analyzing test scores
There are thousands of functions available in R packages, so you might be wondering, "why should I write my own?". Let me show you with a case study. Imagine you are analyzing the scores of a geography test. First you import the data.
Then you look at the data, and like almost every dataset, it has a load of columns that you don't care about, so you just select the columns you need: the student's name, when they took the test, and the score they got. Great work! You're ready to science some data!
5. Copy and paste
Suppose you aren't just analyzing geography test scores, you are also analyzing the scores for English, art, and Spanish.
No problem, all you have to do is to copy and paste the code so you have four sets.
6. Change the subject names
Then you change the names of each subject in three places.
7. Fix the dates
But then you notice that the test dates didn't auto-convert to a date format, so you have to do it manually. Fine, you paste in a mutate line to each set of code.
8. Filter out missing scores
Next, you notice that some of the scores are missing, so you need to filter out those observations from each dataset. Again you write the same line of code four times.
If doing the same work four times wasn't bad enough, there's an even bigger problem. One of these code chunks is wrong. Can you spot the error?
(pause)
Did you find it? The filter command for the art dataset is missing an exclamation mark, so it will keep the rows where the score is missing and drop the rows where the score was a number. This is a really subtle typo that will ruin your analysis.
9. Benefits of writing functions
If you found those last few slides difficult to follow, then don't worry, that's the point. Writing code once is hard, but writing it multiple times is nightmarish.
The main feature of using functions is that you can avoid all this repetition, which in turn reduces the amount of typing you have to do, and avoids the sort of copy and paste errors that you just encountered.
They also make it easier to reuse your code both within and between analyses, and make it easy to share functionality with others.
10. Let's practice!
OK, let's get started!