Specifying datatypes for columns
When you read data from a text or CSV file, you should specify the names and data types for each column. The read()
function will try to determine if the first entry of the dataset contains the column names. R is clever at figuring out some datatypes, but if you are reading a categorical variable coded as 0, 1, and 2, it will read it as a numeric variable, and you will need to specify the data type for that column after reading the data.
This exercise is part of the course
Multivariate Probability Distributions in R
Exercise instructions
- Assign the new column names to the
wine
dataset, then check that they have been correctly assigned. - Change the
Type
column into a factor with three levels. - Use the
str()
function to check the data type/structure before and after changing the data type.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Assign new names
___ <- c('Type', 'Alcohol', 'Malic', 'Ash', 'Alcalinity', 'Magnesium', 'Phenols', 'Flavanoids', 'Nonflavanoids','Proanthocyanins', 'Color', 'Hue', 'Dilution', 'Proline')
# Check the new column names
___
# Check data type/structure of each variable
str(___)
# Change the Type variable data type
___
# Check data type/structure again
___