Get startedGet started for free

Sequencing data

The basic unit of a ChIP-seq dataset is a sequencing read. A full dataset will typically consist of several million reads, stored in BAM files. In this exercise, we'll look at how reads are represented in R, using reads from a small region on chromosome 20.

The reads have already been loaded into R for you. They are stored in a GAlignments object called reads. The GAlignments object is closely related to GenomicRanges, which you may have encountered during introductory Bioconductor courses. This is a good opportunity to remind yourself how to interact with this type of object.

Remember that Bioconductor provides accessor functions to make extracting data easier. For example, start() will extract the start coordinates of all reads.

This exercise is part of the course

ChIP-seq with Bioconductor in R

View Course

Exercise instructions

  • Print the reads object to obtain a summary of the data.
  • Get the start position of the first read.
  • Get the end position of the last read.
  • Determine the number of reads covering each position in the selected region, i.e. compute the read coverage using the function of the same name.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print the 'reads' object to obtain a summary of the data
print(___)

# Get the *start* position of the first read
start_first <- ___(reads)[1]

# Get the *end* position of the last read
end_last <- ___(___)[length(___)]

# Compute the number of reads covering each position in the selected region
cvg <- ___
Edit and Run Code