1. RNA-Seq next steps
Hopefully, the steps in an RNA-Seq differential expression analysis with DESeq2 are a bit clearer now and you feel comfortable trying to tackle an analysis on your own!
2. RNA-Seq next steps
That being said, don't forget to search the DESeq2 vignette for questions that you have about the package, and when that doesn't work, posting to the Bioconductor support site using the link displayed with a tag to DESeq2. The developers are often quick to reply to any issues.
3. DESeq2 functionality
In this course, we have covered how to perform differential expression analysis using the Wald test for pairwise comparisons with simple experimental designs. However, there are experiments that may require a different approach.
For example, you may have more complex experimental designs or you may want to perform testing across multiple groups using a Likelihood ratio test instead of the Wald test for pairwise comparisons. DESeq2 has the functionality to address these situations, but they are outside the scope of this course. That being said, the vignette does cover in quite a bit of detail how to adapt DESeq2 to these situations, so please consult the documentation or the Bioconductor support site.
4. Overview of goals
The goal of this course was to identify differentially expressed genes associated with fibrosis in wildtype and smoc2 over-expression samples.
We took the count matrix of reads aligning to each gene and the metadata and performed a differential expression analysis to determine those genes with significant differences in expression between the fibrosis and normal samples, and we successfully output a list of significant genes! This is great; we are awesome! But what do we do now?
5. Significant genes interpretation
Sometimes the list of DE genes can be an end in and of itself, as we could look at the highly significant genes or the significant genes with large fold changes between conditions to handpick individual genes for experimental validation.
Alternatively, we could look to see whether expected genes are identified as differentially expressed.
However, not all differentially expressed genes between conditions will be returned, so if your gene of interest does not show up it does not absolutely mean it is not differentially expressed. It just means that if it is, we were not able to detect it. Perhaps we didn't have enough biological replicates or perhaps our sample preparation added a lot of variation to the data that we didn't account for or some other reason, but it doesn't mean that it is definitely not differentially expressed.
6. Significant genes interpretation
Also, please don't just trust that all of our significant genes ARE differentially expressed between conditions. All results need to be validated in the laboratory! Remember that in these analyses, roughly 5% of the significant genes are false positives. We hope this won't include your favorite gene, but better to validate now then move forward with faulty data.
7. Significant genes interpretation
While the gene list can be helpful without further investigation, oftentimes we want to determine the biological significance of all of these genes. Functional analysis methods are helpful for elucidating what biological processes, pathways or phenotypes might be associated with your results. For instance, you can look for the enrichment of genes associated with particular processes or pathways within your list of significant genes relative to all genes tested.
There are many types of functional analyses available, and many popular Bioconductor R packages for performing these types of analyses. While outside the scope of this course, we do encourage you to explore these further on your own.
8. Conclusion
I hope that you are now able to determine which genes are likely involved with your condition of interest!
9. Congratulations!
Best of luck with your analyses!