1. Introduction to RNA-Seq
Hi, my name is Mary Piper. I am a consultant and trainer for the bioinformatics core at the Harvard T.H. Chan School of Public Health. My expertise is in RNA-Seq analyses, and I am excited to lead you through the RNA-Seq workflow.
In this course, we will discuss all steps in the RNA-Seq workflow, but will focus our hands-on lessons to the differential expression analysis. By the end of this course, you should be able to independently discover the genes that are differentially expressed between your experimental groups.
However, before we get into the analysis details, let's take a step back to ensure we have a solid understanding about what RNA-Seq is and the types of questions we can address using this technique.
2. The Genome
All living organisms contain the instructions for life in their genome, which is present in the nuclei of their cells.
The genome is comprised of double-stranded DNA divided into chromosomes; for humans there are 23, but different organisms will have differing numbers of chromosomes.
The building blocks of our DNA are called nucleotides, and there are four different nucleotide bases in DNA: guanine, adenine, cytosine, and thymine. We will refer to these nucleotides as G, A, C, and T.
3. Nucleotides
The double-stranded DNA forms a helix with a sugar-phosphate backbone, and within this helix, A nucleotides pair with T and G nucleotides pair with C.
The order of these nucleotides is called the DNA sequence.
4. Genes
Within this sequence are regions called genes. Genes provide instructions to make proteins, which perform some function within the cell. To make proteins, the DNA is transcribed into messenger RNA, or mRNA, which is translated by the ribosome into protein.
Some genes encode RNA that does not get translated into protein; these RNAs are called non-coding RNAs, or ncRNAs. Often these RNAs have a function in and of themselves and include rRNAs, tRNAs, and siRNAs, among others. All RNAs transcribed from genes are called transcripts.
5. RNA processing
To be translated into proteins, mRNA must undergo processing. In this figure, the top strand in the image represents a gene in the DNA, comprised of the untranslated regions (UTRs), highlighted in blue, and the open read frame, highlighted in red. Genes are transcribed into pre-mRNA, which still contains the intronic sequences. Transcription represents the blue portion of the image. After post-transcriptional processing, shown in the grey section of the image, the introns are spliced out and a polyA tail and 5' cap are added to yield mature mRNA transcripts. The mature mRNA transcripts can be translated into protein, shown in the red portion of the image.
While mRNA transcripts have a polyA tail, which is a sequence of As at the end of the transcript, many of the non-coding RNAs do not.
6. Gene expression in cells
Although all cells contain the same DNA sequence, muscle cells are different from nerve cells and other types of cells because of the different genes that are turned on in these cells and the different RNAs and proteins produced.
7. Gene expression in disease
Similarly, a disease-causing mutation can lead to differences in what genes are turned on, or expressed, and which genes are turned off. A mutation can affect the type and quantity of RNAs and proteins produced.
To explore the gene expression changes that occur in disease or between different conditions, it can be useful to measure the quantity of RNA expressed by all genes using RNA-Seq. Then, differential expression analysis of RNA-Seq data can be used to determine whether there are significant differences in gene expression between conditions.
8. RNA-Seq questions
Using differential expression analyses, we can ask various questions, including: Which genes are differentially expressed between sample groups? Are there any trends in gene expression over time or across conditions? Which groups of genes change similarly over time or across conditions? What processes or pathways are enriched for my condition of interest?
9. Let's practice!
Now let's explore the biology a bit more.