Preparing the data
Unlike prior chapters, where we prepared the data for you for unsupervised learning, the goal of this chapter is to step you through a more realistic and complete workflow.
Recall from the video that the first step is to download and prepare the data.
This exercise is part of the course
Unsupervised Learning in R
Exercise instructions
- Use
read.csv()
function to download the CSV (comma-separated values) file containing the data from the URL provided. Assign the result towisc.df
. - Use
as.matrix()
to convert the features of the data (in columns 3 through 32) to a matrix. Store this in a variable calledwisc.data
. - Assign the row names of
wisc.data
the values currently contained in theid
column ofwisc.df
. While not strictly required, this will help you keep track of the different observations throughout the modeling process. - Finally, set a vector called
diagnosis
to be1
if a diagnosis is malignant ("M"
) and0
otherwise. Note that R coercesTRUE
to 1 andFALSE
to 0.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
url <- "https://assets.datacamp.com/production/course_1903/datasets/WisconsinCancer.csv"
# Download the data: wisc.df
# Convert the features of the data: wisc.data
# Set the row names of wisc.data
row.names(wisc.data) <- wisc.df$___
# Create diagnosis vector
diagnosis <- as.numeric(wisc.df$diagnosis == ___)