Sequencing data
The basic unit of a ChIP-seq dataset is a sequencing read. A full dataset will typically consist of several million reads, stored in BAM files. In this exercise, we'll look at how reads are represented in R, using reads from a small region on chromosome 20.
The reads have already been loaded into R for you. They are stored in a GAlignments
object called reads
. The GAlignments
object is closely related to GenomicRanges
, which you may have encountered during introductory Bioconductor courses. This is a good opportunity to remind yourself how to interact with this type of object.
Remember that Bioconductor provides accessor functions to make extracting data easier. For example, start()
will extract the start coordinates of all reads.
Cet exercice fait partie du cours
ChIP-seq with Bioconductor in R
Instructions
- Print the
reads
object to obtain a summary of the data. - Get the start position of the first read.
- Get the end position of the last read.
- Determine the number of reads covering each position in the selected region, i.e. compute the read coverage using the function of the same name.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Print the 'reads' object to obtain a summary of the data
print(___)
# Get the *start* position of the first read
start_first <- ___(reads)[1]
# Get the *end* position of the last read
end_last <- ___(___)[length(___)]
# Compute the number of reads covering each position in the selected region
cvg <- ___