1. Interpreting Gene Lists
During the previous lesson you've learned how to convert a list of peak calls into a list of affected genes. But that still doesn't really tell us what is going on. Genes are a bit more informative than seemingly arbitrary genome coordinates, but what are these genes doing? How does any of this explain the observed differences between the different groups of samples?
2. Gene Sets
One strategy you can use to make sense of these, potentially very long, gene lists is to use a gene set enrichment approach. This involves defining groups of genes that are related by their function in some way.
3. Gene Set Enrichment
You can then consider how many of the genes in each set are associated with peaks of interest. Gene sets that contain a surprisingly large proportion of peak associated genes, like *Set 2* in this example, are likely to be of particular relevance. Since these gene sets were defined by grouping genes according to some known common property they are much easier to interpret than the large list of individual genes.
4. Finding enriched gene sets
The *chipenrich* package has been designed to carry out enrichment analyses specifically for ChIP-seq peak sets. It allows you to provide the peak locations directly, without having to worry about annotating them first. Annotation will happen automatically as part of the enrichment analysis. All you have to do is to indicate the reference genome, using the `genome` argument, and a method to associate peaks with genes. Use the `genesets` argument to select one of several supported genesets and `locusdef` to determine how peaks should be associated with genes. For the following exercise, you'll be using the closest transcription start site but other methods based on windows of varying size are also supported.
5. Let's practice!
Time to put this into practice.