1. What is RMarkdown?
Welcome to the third chapter and the second part of this course. In the first part you brought ggplot2 to the maximum to produce nice and custom plots. Now you are going to create a reproducible report and customize that as well.
2. What you have done so far
Let's have a quick look back at the first two chapters. The end result was this nice looking and custom plot. To get there, you first loaded the necessary packages like dplyr and forcats, you then loaded in and preprocessed the data. For instance, you filtered it to only contain European countries.
You then defined a custom theme which you applied to a rather unconventional plot, the dot plot. In the end, you polished the plot and exported it in the right dimensions so it can be easily perceived on mobile screens. That's quite something!
In this chapter and the next one, you will get to know a whole new technique to produce nicely formatted reports. These reports can be generated from within RStudio using the RMarkdown language.
3. <<<New Slide>>>
So what is RMarkdown? In a nutshell, RMarkdown is a framework that helps you to convert your R code and formatted prose into a wide range of output formats: HTML pages, PDFs, Word documents and even interactive web applications, to name a few. Let's have a look at a concrete example.
4. Screencast
Here's an RMarkdown document in RStudio. The document starts with a header where the title and the author as well as other settings are specified. Then follows a combination of prose and R code chunks. The prose can be formatted in many ways, for instance, with bold text. You can also embed and style weblinks.
Upon clicking "Knit HTML" in RStudio, the document is converted into a HTML file that pops up on the right side of the window. This document contains the formatted prose, and also the output of the R code chunks, such as a table.
5. Behind the scenes
That's what's happening behind the scenes when you click the "knit" button in RStudio. First, your RMarkdown document is knitted with the package knitr, meaning that all the code chunks are executed and the output is written to an ordinary Markdown document.
This document is then processed by the pandoc software, which can convert a Markdown document to many different output formats such as PDFs and HTML documents.
6. Reproducibility
The greatest side-effect of using RMarkdown is reproducibility. If somebody is given your RMarkdown report, he or she cannot only glance at the results, but also reproduce them by running your code, and thus question it – this is an important principle in science and also data journalism. Additionally, you can recreate your report with the click of a button, for instance if there are new data available.
But reproducibility doesn't come for free. First of all, the code in the RMarkdown file needs to be executable on other people's machines.
Also, code without data is useless, so the data should be provided as input to the RMarkdown report. At least a link to the data should be included, which makes sense if the data set is too big to distribute with your report.
By fulfilling these two principles, the report is what I call minimally reproducible. Ideally, for a fully reproducible report, the software environment that was used should also be known, that is, version numbers of used packages, operating system, R version, etc.
7. Let's practice!
So let's start experimenting with RMarkdown!