Session Ready
Exercise

Explore a toy scRNA-Seq dataset

One of the important tools in RNA-seq analysis is matrix of counts, i.e. the number of sequenced reads aligned to each gene and each sample. In this exercise, you will explore a "toy" single-cell RNA-Seq dataset with 10 genes and 5 cells called counts. The first few lines of the dataset look like this:

         SRR2140028 SRR2140022 SRR2140055 SRR2140083 SRR2139991
Lamp5            10         11          0          0          8
Fam19a1          11          9          0          6          0
Cnr1              0          0          0         12          0
...

Here Lamp5, "Fam19a1", etc, are genes, and SRR2140028,SRR2140022 etc are cells.

A note on summarizing data.

In this course, you will often need to count the elements of a matrix or vector satisfying a certain condition, for example: "how many entries of the counts matrix are greater than 2"? You can answer such questions with the sum() function. For example, sum(counts > 2) will find the number of elements in the counts matrix with value greater than 2. This is because counts > 2 returns a Boolean matrix that has TRUE values in places where the corresponding element of counts are greater than 2 (try it out in the console!). For a logical vector or matrix, each TRUE counts as 1, and each FALSE — as zero. So summing all the elements of a logical vector or matrix is equivalent to counting all the 1’s (i.e. counting all the TRUE values).

Similarly, mean(counts > 2) will return the proportion of such elements.

Keep this technique in mind — it will come in handy many times!

Instructions 1/4
undefined XP
  • 1
  • 2
  • 3
  • 4
  • Press "Submit Answer" to print the counts dataset in the console, and take a few seconds to familiarize yourself with it.