1. Flexible linear models
In this chapter, you'll learn how to create linear models for more complicated study designs.
2. Models for complicated study designs
Recall the linear model I made in the previous chapter for the breast cancer data. There were two groups: ER-negative and ER-positive. The standard linear model parametrization worked well in this case because the hypothesis of interest could be tested by directly testing if beta-one was equal to zero.
However, this parametrization becomes clumsy when you have more complicated study designs, for example, if you had three independent groups to compare. It is still straightforward to test for differences between group 2 versus group 1 and group 3 versus group 1, but it's less elegant to test for differences between groups 2 and 3.
Using the terminology in the limma User's Guide, this is known as the treatment-contrasts parametrization, because each coefficient represents a difference from a base condition.
3. Group-means parametrization
An alternative is the group-means parametrization, in which each coefficient models the mean expression level in a given group. In this model there is no intercept coefficient. And since the coefficients are no longer differences, you have to construct contrasts to test for differential expression. In the breast cancer example with two groups, you test if the difference in the two coefficients is equal to zero.
In the hypothetical example with 3 groups, it is now straightforward to test all pairwise comparisons by constructing 3 contrasts.
4. Design matrix for group-means
To create a model without an intercept term, you include a zero as the first term in the formula argument to `model.matrix`. For the breast cancer data, this creates a design matrix with two coefficients, one for the er-negative tumors and one for the er-positive. As expected, the columns for the coefficients sum to the number of samples with that status.
5. Contrasts matrix
To create contrasts to test specific hypotheses, limma provides the function `makeContrasts`. You can make any number of contrasts by referring specifically to the column names of the design matrix. This is because the design matrix object is passed to the `levels` argument of `makeContrasts`. For the breast cancer data, I have created a contrast called status, which tests for a difference in the means of the erpositive and ernegative samples.
Printing the matrix, the coefficients are the rows and the contrasts are the columns. The equation erpositive minus ernegative is saved as multiplying the erpositive coefficient by 1 and the ernegative coefficeint by -1.
6. Testing the group-means parametrization
Testing the contrasts requires one extra step in the limma pipeline. After fitting the coefficients with `lmFit`, you need to fit the specific contrasts of interest using the function `contrasts.fit`. Note that this changes the coefficients to correspond to the contrasts.
7. The parametrization does not change the results
Finishing the pipeline by calculating the t-statistics with `eBayes` and counting the number of differentially expressed genes, you can see that the results are identical to those obtained in the previous chapter using the treatment-contrasts parametrization. 5,004 genes are upregulated in ER-positive tumors compared to ER-negative tumors, 6,276 are downregulated, and 11,003 do not differ. Mathematically, you are testing the exact same thing, thus you can decide which parametrization is more convenient for you to write depending on the study you are analyzing.
8. Let's practice!
Now it's your turn to build and test a group-means model for the leukemia data.