DESeq2 model - contrasts

1. DESeq2 model - contrasts

Now that we have explored the fit of our data to the model, we can extract the DE testing results.

2. DESEq2 workflow

We can also make more accurate estimates of the foldchanges, which represent the expression of one samplegroup relative to another.

3. DESeq2 workflow

When we ran the DESeq() function, the size factors were calculated to generate the normalized counts, the shrunken dispersions were estimated prior to model fitting, then, differential expression testing was performed. However, before we extract our results, let's explore the model a bit more to understand the results.

4. DESeq2 Negative Binomial Model

We know that RNA-Seq data can be represented well using the negative binomial model, as it accounts for the additional variation in the data added by the small number of biological replicates. The size factors, indicated by Sij, normalized counts, indicated by Qij, and shrunken dispersions, denoted by alpha-i, are used as input to the negative binomial model to fit the raw count data.

5. DESeq2 Negative Binomial Model

For each gene, the model uses the log2 normalized counts, denoted on the left side of the equation to determine the log2 foldchange estimates, indicated by the beta-ir, for the samples of the condition of interest, represented by xjr, for each gene. In addition, to determining the log2 foldchanges, the associated standard error is also output.

6. DESeq2 contrasts

By default, DESeq2 will perform the Wald test for pairwise comparisons to test for differences in expression between two sample groups for the condition of interest, in this case, condition. The sample groups for condition are fibrosis and normal. The results of the testing can be extracted using the results() function and specifying a significance level, or alpha value, using the alpha argument. You can choose the alpha based on how stringent you want to be with your analysis. Lower alpha values indicate less probability of identifying a gene as DE when it is actually not. We will use a standard alpha of 0-point-05. The top of the output shows "log2 foldchange condition normal versus fibrosis", indicating that fibrosis is the base level of comparison. This means that all log2 fold changes represent the normal group relative to the fibrosis group, which doesn't seem the most intuitive. Instead of using the default results we can perform any pairwise comparison between the sample groups by supplying our own contrast.

7. DESeq2 contrasts

Contrasts, which specify the sample groups to compare, can be given directly to the results() function using the contrast argument. Within the combine function, we need to specify the condition of interest, level to compare, and base level. To specify the normal sample group as the base level for condition, we could create the contrast defining normal as the base level and fibrosis as the level to compare. Note that the condition of interest and sample groups for the condition need to match the names in the metadata.

8. DESeq2 contrasts

Now the results give log2 fold changes of the fibrosis group relative to normal.

9. DESeq2 LFC shrinkage

To explore our results a bit, the MA plot can be helpful. The MA plot shows the mean of the normalized counts versus the log2 fold changes for all genes tested. We can use the DESeq2 function plotMA() to create the plot and the genes that are significantly DE are colored red. Note the large log2 foldchanges, particularly for genes with lower mean count values. These fold changes are unlikely to be as accurate for genes that have little information associated with them, such as genes with low numbers of counts or high dispersion values.

10. LFC shrinkage

To improve the estimated fold changes we can use log2 foldchange shrinkage. For genes with low amounts of information available, shrinkage uses information from all genes to generate more likely, lower, log2 fold change estimates, similar to what we did with dispersions. DESeq2 has the lfcShrink() function to generate the shrunken log2 foldchanges. We need to specify the DESeq2 object, the contrast, and our results object. We can then create the MA plot again.

11. LFC shrinkage

Now we see more restricted log2 foldchange values, especially for lowly expressed genes. These shrunken log2 foldchanges should be more accurate; however, shrinking the log2 foldchanges will not affect the number of differentially expressed genes returned, only the log2 fold change values. Now that we have accurate fold changes, we can extract the significant DE genes and perform further visualizations of results.

12. Let's practice!

Time to practice.