1. What is ChIP-seq?
Hi, and welcome to "ChIP-seq Workflows in R". My name is Peter, I'm a Statistician at Macquarie University and I will be your instructor for this course. I'm assuming that you have some experience with R and have worked through one of the introductory bioconductor courses on DataCamp. In this course you will learn how to process and analyze ChIP-seq data in R. Before we get into the meat of the course, let us briefly talk about what ChIP-seq is and how it is used.
2. How do cells know what to do?
The question at the center of a ChIP-seq experiment is "How do cells know what to do?" Your body contains several trillion cells, each carrying the same genome. Yet, these cells do very different jobs. Some are neurons in your brain, some form your skin and some flow through your body as blood cells. What is it that makes these cells different from each other?
3. DNA to Proteins
The function of a cell is in large part determined by the genes that it expresses.
4. DNA to Proteins 2
Genes, encoded in the DNA that makes up the genome, are transcribed into RNA and then translated into proteins. The resulting proteins are responsible for carrying out the cells function.
5. Regulating gene expression
Inside each cell, a complex machinery of proteins is responsible for ensuring that the right genes are translated into proteins. Inhibitors are proteins that bind to DNA to deactivate specific genes. Such inhibitors have to be removed, through the interaction with other proteins, before genes can be expressed. A complex of activating transcription factors can then bind to the DNA, allowing gene expression to proceed.
Through this process, the regulatory machinery ensures that a cell correctly performs its role. If these regulators are disrupted cells can get out of control, causing a variety of diseases, including cancer.
6. Chromatin Immuno-precipitation
ChIP, short for Chromatin Immunoprecipitation, is a technique that can be used to extract specific proteins, together with any parts of the genome they were bound to, from a cell. We can then use the DNA sequences attached to the proteins to infer the sites across the genome that they interact with by identifying regions of the genome that are overrepresented in the sequencing data. By comparing these binding sites between different individuals, e.g. healthy volunteers and cancer patients, we can uncover the mechanisms that are responsible for the differences between them.
7. The Data
Throughout this course you will explore one particular ChIP-seq dataset. This data consist of samples taken from patients with prostate cancer and fall into two groups. The first group consists of five primary tumor samples while the second contains three samples from tumors that have become treatment resistant. Chromatin immunoprecipitation targeting the Androgen Receptor was carried out for all samples. The Androgen Receptor is activated by binding testosterone and is then capable of binding to specific parts of the genome to regulate gene expression.
8. Accessing ChIP-seq data in R
Before we move on to some exercises let's review a few functions used in R to interrogate sequencing data. Mapped sequence reads are typically stored in BAM files. You'll learn more about this type of file in Chapter 2. Using the `readGAlignments()` function you can load data from BAM files. Once loaded, information about the chromosome reads have been mapped to is available via a call to `seqnames()` and their location on that chromosome can be accessed through the functions `start()` and `end()` respectively. We often want to know the location of mapped reads but also how many reads cover any given position in the genome. This information can be computed with the `coverage` function.
9. Accessing peak calls
The main units of interest in the analysis of a ChIP-seq experiment are peak calls that highlight regions of the genome with a high concentration of reads. These peak calls are typically stored in BED files. We'll look at what these files look like in more detail later. Each peak is associated with a score, which quantifies the strength of this particular peak. Peak calls can be loaded with the `import.bed()` function. Coordinates of peaks can then be obtained by calling the `chrom()` and `ranges()` functions. The `score()` function provides access to peak scores.
10. Let's practice!
Time to put this into practice.