Session Ready
Exercise

GC content

The GC content (or guanine-cytosine content) is the percentage of bases on a DNA or RNA molecule that are either guanine (G) or cytosine (C) out of four possible bases. In addition to guanine (G) and cytosine (C), the other bases are adenine (A), and thymine (T) in DNA or uracil (U) in RNA. We'll see later in the course that GC content could be a bias in scRNA-Seq.

You'll use a boxplot to explore the GC content of the genes in the toy scRNA-Seq dataset. A boxplot is used as a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum.

The matrix gene_info, which is aptly named since it has all the information about the genes in the dataset, is available for you in your workspace.

Instructions
100 XP
  • Calculate the mean of the GC content across all genes. Assign it to the variable gc_mean.

  • Similarly, calculate the standard deviation of the GC content across all genes. Assign it to the variable gc_sd.

  • Use the function boxplot() to plot the boxplot of GC content for all genes. The function boxplot() takes a vector as input.