Get startedGet started for free

Replication and blocking

1. Replication and blocking

Congratulations on completing the first lesson! We'll now look at the other two key components of an experiment- Replication and Blocking.

2. Replication

Replication is the idea that we need to repeat our experiment in order to assess what variability looks like. Although we applied randomization to ensure that treatment groups are as similar as possible, we can't simply take the result of a single experiment as granted. Suppose we're testing drug efficacy, we certainly can't analyze data if we only gathered data from one patient; maybe it worked for them, but how do we know if it will also work for other people? Our experiment must be designed to be easily replicable to get a wide range of outcomes. We'll look at the example of finding out the number of replications in mtcars dataset with the dplyr package. Recall that the pipe operator chains functions together and the function count() groups mtcars by cyl and then counts how many there are of each number of cylinders, thereby giving us the number of replications.

3. Blocking

Blocking is a technique to help control variability. A classic example is to test the effects of a drug on male and female patients, blocking by sex, to account for treatment variability, in this case, known differences in the drug reactions of male and female bodies. In Chapter 3 and 4, we'll cover some of more advanced techniques of blocking, including blocking in two dimensions.

4. Boxplots

Both replications and blocking help identify if the treatment group allocations indeed have effects on the response variables we're interested in. But before we start with formal statistical tests, we could use some data visualization to determine whether there might be treatment group effects. A boxplot could help us achieve this goal. Here's a boxplot illustrating the difference in miles per gallon by the number of cylinders, based on the mtcars dataset. We notice that the positions of the boxes are disparate, suggesting that there might be an effect of the number of cylinders on the miles per gallon.

5. Functions for modeling

Here's a sneak peek of the modeling functions we'll be using for the following exercises. We'll certainly cover them in more details in the next chapter, but having an idea of what work they could accomplish could help us understand replication and blocking a bit better. The function lm() is used to fit linear models. It can be used to carry out regression and analysis of covariance. aov() calls on lm() and fit an analysis of variance model, so it outputs regression coefficients, fitted values, residuals, and etc. anova(), on the other hand, is a generic function that computes analysis of variance tables for one or more fitted model objects.

6. Let's practice!

Let's do some exercises to see examples of replication and simple blocking!