CommencerCommencer gratuitement

Sequencing data

The basic unit of a ChIP-seq dataset is a sequencing read. A full dataset will typically consist of several million reads, stored in BAM files. In this exercise, we'll look at how reads are represented in R, using reads from a small region on chromosome 20.

The reads have already been loaded into R for you. They are stored in a GAlignments object called reads. The GAlignments object is closely related to GenomicRanges, which you may have encountered during introductory Bioconductor courses. This is a good opportunity to remind yourself how to interact with this type of object.

Remember that Bioconductor provides accessor functions to make extracting data easier. For example, start() will extract the start coordinates of all reads.

Cet exercice fait partie du cours

ChIP-seq with Bioconductor in R

Afficher le cours

Instructions

  • Print the reads object to obtain a summary of the data.
  • Get the start position of the first read.
  • Get the end position of the last read.
  • Determine the number of reads covering each position in the selected region, i.e. compute the read coverage using the function of the same name.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Print the 'reads' object to obtain a summary of the data
print(___)

# Get the *start* position of the first read
start_first <- ___(reads)[1]

# Get the *end* position of the last read
end_last <- ___(___)[length(___)]

# Compute the number of reads covering each position in the selected region
cvg <- ___
Modifier et exécuter le code