Preparing the data
Unlike prior chapters, where we prepared the data for you for unsupervised learning, the goal of this chapter is to step you through a more realistic and complete workflow.
Recall from the video that the first step is to download and prepare the data.
Diese Übung ist Teil des Kurses
Unsupervised Learning in R
Anleitung zur Übung
- Use
read.csv()function to download the CSV (comma-separated values) file containing the data from the URL provided. Assign the result towisc.df. - Use
as.matrix()to convert the features of the data (in columns 3 through 32) to a matrix. Store this in a variable calledwisc.data. - Assign the row names of
wisc.datathe values currently contained in theidcolumn ofwisc.df. While not strictly required, this will help you keep track of the different observations throughout the modeling process. - Finally, set a vector called
diagnosisto be1if a diagnosis is malignant ("M") and0otherwise. Note that R coercesTRUEto 1 andFALSEto 0.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
url <- "https://assets.datacamp.com/production/course_1903/datasets/WisconsinCancer.csv"
# Download the data: wisc.df
# Convert the features of the data: wisc.data
# Set the row names of wisc.data
row.names(wisc.data) <- wisc.df$___
# Create diagnosis vector
diagnosis <- as.numeric(wisc.df$diagnosis == ___)