1. Sequence Ranges
Hi! I'm Paula Martinez, and I'll be your instructor. I am a data scientist and I train people on how to analyze their data more efficiently. Also, I am a bioinformatician, interested in genomic variation that leads to diversity. For these kinds of analyses, I love using R and Bioconductor, particularly because of the possibilities to work with different datasets and to collaborate within the community.
Nowadays, we can obtain billions of sequences for less than a thousand dollars. That means that large sequencing projects are in need of analysis, and that's where your knowledge of sequence analysis comes to play.
Bioconductor packages provide convenient structures and function for representing, manipulating, and annotating genomic data. This chapter will walk you through GenomicRanges and how to use it to target specific sequences of interest.
2. IRanges with numeric arguments
Let's start with the IRanges package, which provides the fundamental infrastructure and operations for manipulating intervals of sequences.
Keep in mind that we sequence data to understand its components, functions, structures, and how they evolve.
First, we load the IRanges package, and then create myIRanges using the IRanges() function and defining the start and end.
As shown in the output, this is an IRanges object with 1 range, starting at position 20, and ending at position 30.
There are multiple ways to construct IRanges. You will learn the basic ways here, and then you can practice on your own.
3. More IRanges examples
We will create the same IRanges object using two different sets of arguments.
The first example specifies start and width.
The second specifies the start and end.
The missing argument can be calculated using the equation shown below.
Also, notice how values can be recycled to the length of the longest argument. In the second example, end = 30 is recycled twice, as there are 2 values to start ranges.
4. Rle - run length encoding
Another way to construct IRanges is by using a Rle definition. Rle stands for Run-length encoding. The Rle() function computes and stores the lengths and values of a vector or factor. Rle objects are general S4 containers, used to save long vectors with repetitions, more efficiently.
For example, we have a vector of some numbers, the total length of the vector is 8. The Rle of this vector is stored in 5 runs because there are 3 elements of 2 and 2 elements of 3, which are repeated consecutively.
This is quite useful to represent sequence ranges as very commonly they will have repetitions.
5. IRanges with logical vector
So far, you've learned how to create IRanges with numeric arguments defined as start, end, or width. You can also create IRanges using a logical vector to define which elements of a sequence will be kept or skipped.
This example uses a vector of logical elements as the start of the range. The first two elements are skipped, hence starting at position 3, then selecting the two following elements positioned third and fourth in the sequence. This range has a width of 2.
This is particularly useful when you want to skip elements of a sequence you will use the logical value FALSE. Even better, you can create this logical vector based on a condition.
6. IRanges with logical Rle
You can use the Rle definition to create an IRanges object with multiple ranges. The gi logical vector of 7 elements is converted to an Rle object. The resulting IRanges has two ranges. The first with 3 elements and the second with two elements corresponding to the elements equal TRUE in the Rle.
7. In summary
IRanges are hierarchical data structures and can contain metadata, this is quite useful to store genes, transcripts, polymorphisms, GC content, and more.
To construct an IRanges object you can provide start, end or width as numeric vectors. Also, the start argument can be a logical vector or logical Rle.
Remember Rle stands for Run length encoding and it uses storage efficient definition.
IRanges arguments fill in the blanks of sequence length by using the equation width equals end minus start plus 1.
8. Let's practice using sequence ranges!
Now It's your turn to try some examples of IRanges.