Sequencing data

The basic unit of a ChIP-seq dataset is a sequencing read. A full dataset will typically consist of several million reads, stored in BAM files. In this exercise, we'll look at how reads are represented in R, using reads from a small region on chromosome 20.

The reads have already been loaded into R for you. They are stored in a GAlignments object called reads. The GAlignments object is closely related to GenomicRanges, which you may have encountered during introductory Bioconductor courses. This is a good opportunity to remind yourself how to interact with this type of object.

Remember that Bioconductor provides accessor functions to make extracting data easier. For example, start() will extract the start coordinates of all reads.

Print the reads object to obtain a summary of the data.
Get the start position of the first read.
Get the end position of the last read.
Determine the number of reads covering each position in the selected region, i.e. compute the read coverage using the function of the same name.

Introduction to ChIP-seq

Back to Basics - Preparing ChIP-seq data

Comparing ChIP-seq samples

From Peaks to Genes to Function

Exercise

Sequencing data

Instructions