Get startedGet started for free

DE analysis results

After exploring the PCA and correlation heatmap, we found good clustering of our samples on PC1, which seemed to represent the variation in the data due to fibrosis, and PC2, which appeared to represent variation in the data due to smoc2 overexpression. We did not find additional sources of variation in the data, nor any outliers to remove. Therefore, we can proceed by running DESeq2, DE testing, and shrinking the fold changes. We performed these steps for you to generate the final results, res_all.

In this exercise, we'll want to subset the significant genes from the results and output the top 10 DE genes by adjusted p-value.

This exercise is part of the course

RNA-Seq with Bioconductor in R

View Course

Exercise instructions

  • Use the subset() function to extract those values with an adjusted p-value less than 0.05. Save the subset as a data frame named smoc2_sig by using the data.frame() function and turning the row names to a column named geneID using the rownames_to_column() function.

  • Order the significant results by adjusted p-values using the arrange() function, select the columns with Ensembl gene ID and adjusted p-values, and output the top significant genes using head().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Select significant genese with padj < 0.05
smoc2_sig <- subset(___, ___) %>%
  				___() %>%
  				___(var = ___)

# Extract the top 6 genes with padj values
smoc2_sig %>%
	___(___) %>%
	select(___, ___) %>%
	head()
Edit and Run Code