1. More package structure
We'll next look at why packages are preferred over defining functions in scripts, how to effectively structure code and functions in the R directory, and how to name and license our package.
2. Advantages of packages over scripts
The main advantage of creating packages over defining functions in scripts is that packages provide better organization. They also facilitate reproducibility, enable easier collaboration, and simplify sharing and distribution of our code.
3. Structuring code in the R directory
Recall the R directory of our package will contain our functions, or R scripts, saved as dot-R files, with each function typically placed in its own file for clarity and ease of maintenance.
4. Choose a valid package name
There are a few key considerations when naming an R package. First, a package name should only contain ASCII letters, numbers, and periods. Second, it needs to have at least two characters. It should start with a letter and should not end with a dot. Note that an underscore is not a valid character in CRAN package names. CRAN is a central repository for R packages.
5. The available package
The available package in R ensures that the name we choose is not already taken on CRAN. We use the available function, passing the intended package name as the argument. It is a handy tool that checks if our intended package name is valid, available on CRAN, and assesses its sentiment, helping us choose a unique, appropriate, and appealing name.
6. available() output
Suppose we have a package named "horrendous" to check, this package helps with challenging data. Let's look at the sample output when running available on horrendous.
A checkmark tells us that horrendous is valid and available on CRAN. An x would tell us the opposite. Three negative symbols next to sentiment means it has a negative sentiment. The other possible output would be three positive symbols. Unfortunately it has negative sentiment, so it might not be an appealing name for this package.
7. Choose an informative package name
It's important to choose an informative name for our package. The name should give potential users a clear idea of what the package does. Let's look at some examples.
readr is used for reading rectangular text data into R. The informative name 'readr' is a play on the word 'reader', indicating that the package's purpose is related to reading data. The name tidyr indicates the primary goal of the package: to tidy data. It's a play on the word "tidier".
The MASS package, which stands for "Modern Applied Statistics with S," is not so informative. Unless one is familiar with the reference to the book of the same title, it's hard to know what the package does based on the name alone.
utils is a base R package that provides a variety of utility functions. The name utils (short for utilities) is quite generic and doesn't give much insight into what specific functionalities it provides.
8. Licensing our package
When we're ready to share our package, we need to think about how others can use our code. By choosing a license, we specify what others can do with our package. Two commonly used licenses are the MIT License and the Creative Commons Zero (CC0) License.
The usethis package applies these licenses to our package with the use_mit_license and use_cc0_license functions.
The MIT License permits free use, modification, and distribution, but mandates that the copyright notice and license be included in substantial copies of the software. Conversely, the CC0 License also permits free use, modification, and distribution, but without the need for copyright notices, attribution, or licenses.
MIT is usually used for software like R packages, whereas CC0 is used in many different contexts. For creating an R package, we wouldn't normally use the CC0, unless we want to allow anyone to use our package without any mention of us.
9. Let's practice!
It's time to put knowledge into practice.