Organizing the data for DESeq2

1. Organizing the data for DESeq2

After we have our data loaded, we need to make sure it's in a specific format so that it will be accepted as input to DESeq2.

2. Bringing in data for DESeq2: sample order

DESeq2 requires the sample names in the metadata and counts datasets to be in the same order. Therefore, the row names of our metadata need to be in the same order as the column names of our counts data. As we can see in these images, the sample names for the counts are in a nice order while the names in the metadata are not.

3. Bringing in data for DESeq2: sample order

Alternatively, we can explore the row names and column names with the rownames() and colnames() functions.

4. Bringing in data for DESeq2: sample order

By looking at our sample names in both datasets, we can see that the order is not the same, but it's not always clear, so using the all() function with the "double equal to" sign can check if all of the row names of the metadata are in the same order as the column names of the raw counts data. The all() function returns a FALSE value. Now we know that our samples are not in the same order, so we need to reorder the data to use it with DESeq2.

5. Matching order between vectors

To easily reorder the rows of the metadata to match the order of columns in the counts data, we can use the match() function. The match() function takes two vectors as input. The first is a vector of values in the order we want, and the second is a vector of values we would like to reorder. In our example, the column names of the raw counts data is the vector with the order we want, so it will be in the first position of the match() function. The row names of the metadata is the vector to be reordered, so it will be in the second position. The output shows how we would need to reorder the rows of the metadata to be in the same order as the columns in the count data. For instance the 6th row would need to come first, followed by the 9th row, then the 1st row, so on and so forth.

6. Reordering with the match() function

Now, we can use the output of the match() function to reorder the rows of the metadata to be in the same order as the columns in the count data. To do this we can save the indices output by match() to a variable, in this case called idx. Then, we can rearrange the metadata by using the square brackets and adding idx to the rows position. The samples should now be in the same order for both datasets.

7. Checking the order

To check the order we can use the all() function again. Since they match, we can now use these datasets to create the DESeq2 object needed to start the DESeq2 workflow.

8. Creating the DESeq2 object

To create the DESeq2 object, use the DESeqDataSetFromMatrix() function. This function takes as input the raw counts, associated metadata, and a design formula detailing which conditions in the metadata we want to use for differential expression analysis. We will talk in more detail about the design formula later. This function will create a DESeq2 object, of the class Ranged Summarized Experiment. This is a list-like object with slots available for the data it will generate throughout the analysis. Currently, it only has a few of the slots filled with the count data, metadata, and design information.

9. Let's practice!

Time to put what we've learned into practice.