1. Testing for differential binding
Now that you've had a closer look at the data it should be clear that there are some differences in protein binding between the primary tumor and treatment resistent tumor cells. But what exactly are these differences? And which of the peaks differ enough between the two groups to be confident that the difference we see is not just due to random fluctuation in the data?
2. Comparing groups of ChIP-seq samples
To answer these questions you will carry out a statistical analysis of the peak counts using the *DiffBind* package. This in turn will use one of two commonly used approaches to differential analysis of sequencing data, either *DESeq2* or *edgeR*. Both packages implement models appropriate for the analysis of count read count data. The default is to use *DESeq2*, which is what you will be working with. One advantage of working with *DiffBind* is that it provides an easy to use interface to the methods provided by these packages.
3. Creating a shared peak set
Another advantage is that *DiffBind* works hand in hand with the *ChIPQC* package. We can directly use the output produced by the QC procedure as input to the analysis. Before you can proceed to analyze the ChIP-seq samples for differences between groups you have to establish a consistent set of peaks. Comparing across samples requires read counts for each peak that is to be included in the analysis from all samples. This can be accomplished with the `dba.count()` function. The `summits` argument to this function signals that peaks should be re-centered around the consensus peak. The value provided, 250 in this case, is the width of the resulting peak on either side. Here we will end up with 500 bp wide peaks.
4. Establishing a contrast
The final step prior to running the actual analysis requires you tell *DiffBind* how the samples should be split into groups. You will do this using the `dba.contrast()` function. *DiffBind* provides a number of predefined groupings based on the attributes it tracks for each dataset. In this particular case we are interested in the comparison between conditions.
5. Using controls
One issue we haven't addressed yet is the problem of high background coverage. Remember when we looked at the coverage of peaks and background regions in Chapter 2? Some of those background regions had really high coverage and could easily be mistaken for peaks. Control samples can be used to assess background coverage in the absence of ChIP signal to filter out this noise.
6. Running the analysis
Now that the data is prepared it is finally time to run the analysis. Once you have reached this point, running the analysis itself should be straightforward. All that is required is a call to `dba.analyze()`. The only argument required is the `DBA` object you prepared for this purpose.
7. A first look at the results
With the differential binding analysis complete it is time to inspect the results. Simply printing the results object will show the number of differentially bound peaks in addition to the already familiar information about the dataset. In the next lesson, we will explore new ways to summarize and visualize these differences. However, the PCA and heatmap plots you used earlier can also be useful tools at this stage. Creating the same plots based solely on the differentially bound peaks will emphasise the differences between the groups and gives some sense of the extent to which the two groups differ in their binding patterns. Once again *DiffBind* makes this easy for us. Not only does it provide functions to create these plots, they also come with an argument, `contrast`, that allows you to select a set of differentially bound peaks for plotting.
8. Let's practice!
Time to put this into practice.