Get startedGet started for free

Manipulating datasets and data objects

1. Manipulating datasets and data objects

Now that you know how to work with different data objects, let's learn how to create and modify variables within a dataset. You will also learn how to confirm object types and convert between them.

2. Load Davis dataset from car package

For this video you will manipulate the variables in the davis dataset from the car package. The davis dataset contains both directly measured and self-reported heights and weights of men and women. The first 6 rows are displayed here.

3. Compute new variable - body mass index

To add new variables to the davis dataset, you will use the mutate function from the dplyr package. The mutate function is similar to creating new variables in a SAS DATA step. First, you specify the original dataset, davis.

4. Compute new variable - body mass index

Next the modified dataset object davismod is assigned.

5. Compute new variable - body mass index

A new variable bmi is created using the mutate function. BMI is computed from weight in kilograms and height in centimeters. To get the square of height divided by 100, SAS uses two asterisks whereas R uses the carat symbol followed by a 2.

6. Compute new variable - body mass index

The mutate function creates the new variable bmi and adds it to the original davis dataset. The result is assigned to a new dataset object called davismod. bmi is the last column displayed for the davismod dataset.

7. Compute new variable - logical operators

Mutate can be used multiple times, with each new variable added columnwise onto the end of the modified dataset. The difference between the reported height and measured height is named diffht. difflow is computed to indicate if self-reported height was 3 cm or more lower than measured height.

8. Recoding with ifelse

BMI categories can be computed using the ifelse function. The first argument to the ifelse function is the test expression, followed by the result if the expression is true, and then the result if the expression is false. Ifelse statements can be nested to cover all possible categories of interest. Three categories are computed for people underweight or normal for BMI less than 25, overweight for BMI of 25 to less than 30, and obese for BMI of 30 or more. Rows 15 to 21 are displayed using the slice function.

9. Test and confirm object type

A common error encountered using R is a class type mismatch. So, it is very useful to be able to test and confirm your object types. For example, the is.numeric function can be run to confirm that bmi is numeric. Likewise, the is.numeric result for the bmicat variable is FALSE. There are many other is dot functions like is.character and is.logical.

10. Test and confirm object type

In addition to checking whether an object is numeric, character or logical, you can also test and confirm the object's data structure. For example, the is.vector function confirms that difflow from the davismod dataset is a vector. You can also confirm that the davismod dataset is a data frame but is not a matrix.

11. Convert object types - coercion

Many functions in R will not run or perform correctly if the object types supplied are incorrect. So, it is useful to be able to to convert between object types where applicable. Like the is dot functions there are also many as dot functions that can coerce some object types into others. For example, you can pull the difflow logical variable from davismod and convert it to numeric. All of the TRUEs become 1's and the FALSEs become 0's.

12. Convert object types - coercion

You can also convert some data structures into others. For example, if you select only numeric variables weight and height from the davismod dataset, they can be coerced into a matrix using the as dot matrix function.

13. Let's go update the abalone dataset with new variables

Let's use your new skills to go create new variables in the abalone dataset