Scale the whole dataset
Usually the R datasets do not need much data wrangling as they are already in a good shape. But we will need to do little adjustments.
For later use, we will need to scale the data. In the scaling we subtract the column means from the corresponding columns and divide the difference with standard deviation.
$$scaled(x) = \frac{x - mean(x)}{ sd(x)}$$
The Boston data contains only numerical values, so we can use the function scale()
to standardize the whole dataset.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Use the
scale()
function on theBoston
dataset. Save the scaled data toboston_scaled
object. - Use
summary()
to look at the scaled variables. Note the means of the variables. - Find out the class of the scaled object by executing the
class()
function. - Later we will want the data to be a data frame. Use
as.data.frame()
to convert theboston_scaled
to a data frame format. Keep the object name asboston_scaled
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# MASS and Boston dataset are available
# center and standardize variables
boston_scaled <- "change me!"
# summaries of the scaled variables
# class of the boston_scaled object
class(boston_scaled)
# change the object to data frame