Studies with more than two groups

1. Studies with more than two groups

Now you will learn how to model studies with more than 2 groups.

2. A study with 3 groups

The group-means parametrization becomes really useful when a study has more than 2 groups to compare. In this video, I will analyze a data set that includes 3 different types of leukemia: ALL, AML, and CML. The goal is to find the differentially expressed genes that distinguish these 3 sub-types. The data is a subset of that available from the Bioconductor package leukemiasEset, and the full results can be read in the series of papers by Kohlmann, Haferlach, and colleagues. The data contains measurements for 20,172 genes and 36 leukemias, 12 of each type.

3. Group-means model for 3 groups

To test each pairwise combination of the 3 leukemia types, I want to build the following model where each coefficient represents the mean expression level in one of the 3 groups. To find differentially expressed genes, I will test the following pairwise contrasts.

4. Group-means design matrix for 3 groups

Fortunately the process is identical to the two group case. I pass a formula to `model.matrix` that includes a zero to remove the intercept coefficient and the column name that contains the sample labels. This creates 3 coefficients, which are each one for samples of that type and zero otherwise.

5. Contrasts matrix for 3 groups

Recall the three tests to perform. I specify these contrasts using limma's `makeContrasts` function. I refer to each coefficient by the corresponding column name in the design matrix. Viewing the contrasts matrix, you can see that each contrast compares two of the sub-types.

6. Testing 3 groups

With the design and contrasts matrices created, I can run the limma pipeline. A quick glance at the number of upregulated and downregulated genes shows that the biggest difference is between the CML and ALL types. This suggests that these two leukemia subtypes have the biggest difference in cellular function, which is consistent with the known cancer biology.

7. The effect of hypoxia on stem cell function

In the following exercises, you will analyze a data set that includes gene expression measurements of stem cells grown for 24 hours in 3 different levels of oxygen: 1, 5, and 21%. The goal is to identify the genes that are affected by lower oxygen levels (for reference, 21% oxygen is the amount in the atmosphere). The data is a subset of that available from the Bioconductor package stemHypoxia, and the full results can be read in the publication by Prado-Lopez and colleagues. The data contains measurements for 15,325 genes and 6 samples, 2 replicates for each level of oxygen.

8. Let's practice!

Now it's your turn to test the hypoxia data.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.