Get startedGet started for free

Joining with missing values

Two new data.tables have been loaded into your R session: heart and cardio. Each one contains a set of microarray probes you have found to be associated with heart disease in two separate studies*. Each probe measures the expression levels of a gene. Each gene can be measured by one or more probes, and some probes do not have any known gene annotation in the human genome reference sequence. The two studies have used different microarray platforms that use different probes to measure each gene. Your goal is to find which genes had reproducible associations with heart disease in both studies.

* Note: associations are randomly generated, not representative of any true biological finding or real dataset.

This exercise is part of the course

Joining Data with data.table in R

View Course

Exercise instructions

  • Using the merge() function, inner join cardio to heart with the appropriate argument to override any errors that you encounter.
  • Remove the probes from both data.tables with no gene annotation (i.e., remove rows with missing values in the gene column).
  • Repeat the inner join with the new data.tables to get a data.table of reproducible associations between genes and heart disease.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Try an inner join
___

# Filter missing values
heart_2 <- ___
cardio_2 <- ___

# Inner join the filtered data.tables
___
Edit and Run Code