Preparing the data
Unlike prior chapters, where we prepared the data for you for unsupervised learning, the goal of this chapter is to step you through a more realistic and complete workflow.
Recall from the video that the first step is to download and prepare the data.
Deze oefening maakt deel uit van de cursus
Unsupervised Learning in R
Oefeninstructies
- Use
read.csv()function to download the CSV (comma-separated values) file containing the data from the URL provided. Assign the result towisc.df. - Use
as.matrix()to convert the features of the data (in columns 3 through 32) to a matrix. Store this in a variable calledwisc.data. - Assign the row names of
wisc.datathe values currently contained in theidcolumn ofwisc.df. While not strictly required, this will help you keep track of the different observations throughout the modeling process. - Finally, set a vector called
diagnosisto be1if a diagnosis is malignant ("M") and0otherwise. Note that R coercesTRUEto 1 andFALSEto 0.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
url <- "https://assets.datacamp.com/production/course_1903/datasets/WisconsinCancer.csv"
# Download the data: wisc.df
# Convert the features of the data: wisc.data
# Set the row names of wisc.data
row.names(wisc.data) <- wisc.df$___
# Create diagnosis vector
diagnosis <- as.numeric(wisc.df$diagnosis == ___)