Get startedGet started for free

Data frames and matrices

1. Data frames and matrices

The key data structure in R is the

2. The data frame

data frame. These objects are so useful, that they are getting copied in other languages. For example in Python there is the Pandas data frame.

3. Data frames

A data frame has a tabular structure for representing data. Typically, columns are the covariates or variables of interest and rows are observations. A column in a data frame must be the same type. This makes sense, since if we had a column of ages every value should be a number. Rows can have multiple types. A data frame is actually a list of vectors; where each column is a single vector. This has implications for storage. Suppose you were in a library and wanted twenty six books where the authors surname starts with A. This is easy. You just need to locate the correct shelf and take the books. This is what happens when we retrieve a column from the data frame. We locate one starting position of the column, and take what we need. However if you want twenty six books where the authors' surname have a different starting letter, then you need to go round the entire library and locate twenty six separate shelves. This is what happens when you retrieve a row. You need to find the starting locations of every single column and select what you require. A potentially time consuming job.

4. Matrices

A matrix is similar to a data frame. It has a rectangular data structure and the usual subsetting and extracting operations. The crucial difference is it's data type. A matrix can only contain a single data type. We're not allowed to mix and match. This makes storage much easier; as the entire matrix is stored in one continuous block. Selecting columns is easy - we find the start of the column and retrieve the data. Selecting rows is also straightforward. Find the first value then increment a constant amount along the matrix. As we'll see in the exercises, using matrices can provide a massive speed boost.

5. R club

The third rule of R club. Use a matrix whenever appropriate.

6. Let's practice!